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PREFACE 


Some thirty years ago at The Carnegie Institute of Technology one of us 
(GHW) was a student of the other (EMP) in a course entitled Physical 
Measurements. The recitation covered most of the topics in this book, and 
the laboratory experiments were carried on as research projects in which 
the students found out for themselves what measurements were needed to 
verify apparently simple fundamental principles. They also determined 
the accuracy of their verifications and explained any discrepancies. For 
example, the better students would find that Newton's second law could 
not be verified by a car being pulled along a track without determining 
and taking account of both the friction and the moments of inertia of the 
wheels. Even then the verification was only good to within the accuracy 
of the measurements. Some time later EMP collected his notes into an 
unpolished text which was produced only in lithographed form for use in 
the course. 

Meanwhile, GHW was finding that he had been taught much that many 
of his colleagues did not know but needed to know about the production 
of desired numerical results from raw experimental data. Thus the teacher 
had already invested time and effort in a piece of work that was, compar- 
atively speaking, lying idle, and the student had discovered concrete evi- 
dence of the value of that work. Collaboration in the production of a text 
suitable for publication seemed natural. Much rewriting and expansion 
to include topics beyond the level of sophomores majoring in physics was 
undertaken, but the early version remains as the hard core of our early 
chapters. 

We consider this book to be not only a suitable text for undergraduates 
majoring in science or engineering but also a useful guide for professional 
experimentalists in the physical sciences. It starts on a very elementary 
level with discussions of the nature of the numbers one meets in measure- 
ments, of methods of numerical approximations, and of the use of graphs 
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and so on, so that the reader will pass four chapters before he comes to one 
entitled “Errors.” The degree of complexity increases fairly steadily 
throughout the book, so that the reader will find Chapter 12 much more 
demanding. These demands are only on his patience and facility, however; 
except for a knowledge of algebra, trigonometry, and elementary calculus, 
we have tried to make the book mathematically complete. This has been 
done by the inclusion of appendixes in which that knowledge is used as a 
basis for the development of mathematical results needed in the body of 
the text. 

The nature of the book is also indicated by its history. In particular, it 
is not a source book on the foundations of statistical theory. Although we 
have used the method of maximum likelihood, for instance, to justify the 
principle of the minimization of the squares of residuals as a device for 
obtaining the most probable values of parameters, we have not explored 
the method further. Although we have tried to make it clear that the values 
so obtained and especially the standard deviations computed for them can 
never be more than estimates, we have adopted the practice of referring 
simply to “standard deviation,” or “the best estimate of the standard de- 
viation” as being less confusing than the greater particularity that would 
be desirable in a theoretical work. Our purpose is to emphasize objective, 
practical, and appropriate numerical calculations rather than to present a 
study of the theory. 

Despite this disclaimer, the reader will find much of the book to be 
mathematical in nature. We hope to encourage the routine use of the sorts 
of calculations described here and we believe it necessary to this purpose 
that we gain the confidence of the reader by showing details of the processes 
by which various conclusions are reached. More than that, however, there 
are no recipes that will cover all contingencies. We hope that the inclusion 
of the mathematical detail will help the reader toward the application of 
the concepts to the development of the “recipes” he needs for each of his 
own particular problems. 

The reader will not find an enormous number of problems. We have in- 
cluded problems not so much for routine drill as for additional illustration 
and instruction. Many of them could have been used as examples in the 
text ; we have made them problems in the belief that the reader will develop 
a greater understanding of the concepts if he works them through himself. 

We are indebted to Blaisdell Publishing Company, Waltham, Mass., for 
permission to reprint the integral of the normal error function from Joseph 
B. Rosenbach et al. f Mathematical Tables p. 187 (1943). We are indebted 
to the Literary Executor of the late Sir Ronald A. Fisher, F.R.S., Cam- 
bridge, to Dr. Frank Yates, F.R.S., Rothamsted, and to Oliver & Boyd 
Ltd., Edinburgh, for their permission to reprint Tables III and IV from 
their book, Statistical Methods for Research Workers (12th edition, 1954). 
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We are also indebted to the Iowa State University Press for permission to 
reprint Table 10.5.3 from Statistical Methods by George W. Snedecor (5th 
edition, 1956). 

Finally, we wish to acknowledge the patience and support of our wives, 
Ruth Pugh and Marty Winslow, the typing of early drafts by the latter, 
and the typing of the entire final manuscript by Eleanor Schenck. Every 
author knows that if it were not for assistance from relatives, friends, and 
his publisher, who does much more than manufacture a book, the produc- 
tion of a book would be well-nigh impossible. 


Pittsburgh, Pennsylvania E.M.P. 

Argonne, Illinois G.H.W. 

April 1966 
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CHAPTER 

1 


INTRODUCTION 


The object of a study of physical measurements is to develop the thinking 
and reasoning powers, to furnish the special kinds of mental tools, and to 
develop the special kind of mental attitude that result from properly 
making and analyzing various kinds of measurements with particular 
regard to their precision and accuracy. The special attitude referred to is 
the full, operating realization that accuracy itself can be made a subject 
of measurement, that there are relative degrees of accuracy, that accuracy 
is important in one place and means only a waste of effort in another, that 
absolute accuracy is an impossibility, and that a measurement by itself 
is of much less value than when accompanied by a statement of its precision. 

By far the principal part of this book is devoted to the presentation of 
these mental tools and, indeed, to their presentation in ways which we 
hope will be conducive to the growth of the above-mentioned attitude. 
This leaves only a small part of the book focused more specifically on the 
mechanics of taking measurements, but it should be emphasized that, 
while the authors hope the book will find good use in the libraries of working 
scientists, its principal use is seen as a text in a course which includes 
laboratory work. When a course in physical measurements is given in 
conjunction with laboratory work or practical work in the physical sciences, 
its value is enhanced in that it gives the student a certain familiarity with 
apparatus and a facility in handling it. He acquires the habit of using it 
to the best advantage and of keeping it in such condition that it can be 
fully utilized when needed. Furthermore — and this is equally important — 
the laboratory work provides the student with measurements of his own 
on which to practice his course work, under conditions which are ideal 
for properly impressing him with the course material. While the authors 
have no evidence of the effectiveness of such a course without laboratory 
work, they do know that the course with laboratory, as taught for several 
years at the Carnegie Institute of Technology, was very effective. 

1 
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1-1 THE ANALYSIS OF OBSERVATIONS 

Modern science and technology are based on scientific experiments in- 
volving measurements. Carefully designed experiments carefully analyzed 
have produced a body of scientific facts that are not and cannot be ques- 
tioned. For example, the experimental evidence on which Isaac Newton 
(1643-1727) based his famous laws of motion and of universal gravitation 
are still valid. The results obtained from later experiments, which showed 
that Newton’s laws are far more accurate than the original evidence on 
which they were based, are also accepted by modern scientists. Early in 
the twentieth century new data obtained with even more accurate measure- 
ments had become available. These new measurements led Albert Einstein 
(1879-1955) to his general theory of relativity, which has superseded 
Newton’s laws. It is important to note that the measurements which 
established Newton’s laws are still valid. In fact, they helped to establish 
Einstein’s theory, for which Newton’s laws are an excellent approximation 
at the velocities involved in those measurements. Many similar examples 
can be found in the history of science. 

While science does contain a large body of unquestionable facts, there 
are many phenomena in nature which are not well understood primarily 
because of the lack of sufficiently well-designed experiments with properly 
analyzed results. Many concepts and quantities can be established by two 
or more very different types of investigation. The regularity of nature 
is such that, when two different scientists carry out different types of 
experiments to establish a given concept, they must arrive at the same 
conclusion provided that they are both sufficiently competent scientists 
and each carries out a sufficiently careful analysis of his data. Nevertheless 
and unfortunately, scientific journals and books contain many instances 
where one statement concerning a given concept conflicts with another. 
Such difficulties emphasize the importance of the phrases “sufficiently 
competent scientists ” and “sufficiently careful analysis. ” Careful analysis 
sometimes reveals that, although two types of experiments were designed 
to measure the same quantity, they do in fact measure different quantities. 

Conflicting statements in the scientific literature are frequently due to a 
not unnatural desire on the part of some beginning scientists to make 
unusual discoveries early in their lives. When such scientists obtain un- 
expected experimental data they may seize upon unusual explanations 
when well-established principles are sufficient for their analyses. Such 
scientists should remember that important new principles are rarely dis- 
covered in this manner. New principles have usually been discovered 
while carrying out well-planned research programs on fairly well-known 
phenomena. Wilhelm K. Rontgen (1845-1923) discovered x-rays during 
his well-designed attempts to improve on the cathode ray experiments 
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of Lenard and William Crookes (1832-1919). Max Planck (1858-1947) 
originated the quantum theory while he was attempting to improve on 
J. W. S. Rayleigh’s (1842-1919) theory for blackbody radiation. Planck 
modified Rayleigh’s treatment by expressing it in terms of small units of 
energy. He was amazed to discover that his theory agreed with the experi- 
ment when and only when he did not pass to the limit of making these 
units of energy infinitesimal. Since the concept of energy radiation only 
by finite amounts, or quanta, was so radical, Planck spent nearly ten years 
attempting to solve the problem without this radical assumption. These 
ten years of serious but unsuccessful attempts to use previously well- 
established principles provided one of the strongest arguments for the 
acceptance of the new quantum theory. 

Our purpose is to present and discuss modern methods for obtaining 
reliable experimental results. The first basic step is to distinguish between 
errors and mistakes. Mistakes generally are due to some form of careless- 
ness on the part of the experimenter. Errors are inherent in the measuring 
techniques and require special methods for their elimination or reduction. 
While there are many types of errors, they are usually divided into two 
main classifications: systematic errors and chance errors. 

Systematic errors are due to definite discoverable phenomena. The 
period of a simple pendulum depends on the length of the pendulum and 
the gravitational acceleration. Such a system is frequently used to measure 
the latter quantity. If the experimenter uses a cord and measures its 
length before it has been stretched by the weight of the bob, he will be 
making a systematic error. This error can be eliminated by measuring the 
length of the cord after the pendulum is hung, or the data could be cor- 
rected for it by estimating the change in length from the weight of the 
bob and the elastic constants of the cord.* 

When all of the known systematic errors have been eliminated or cor- 
rected for, there usually remains in the data a scatter which is produced 
by unknown causes. To continue with the above example, air currents 
may affect the motion or one may not always judge the end of a swing in 
the same way, and so on. In length measurements that are not expected 
to be extremely precise one frequently judges coincidence by touch, and 
touches will vary. As stated above, the exact causes are unknown, though 
they are usually judged to be of the sort just described. 

A system of statistical methods has been developed for dealing with 
these chance errors. Properly used, these methods provide powerful tools 
for analyzing data. They are ideally suited to modern computer tech- 


* It would seem obvious that in this example it is simpler to eliminate the error 
than to correct for it. 
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niques, but they are also very useful where only desk calculators are 
available. 

The fact that computing machines of all kinds are so useful with these 
statistical methods has led to some misuse of the latter. It is often easier 
to put the data on a machine than it is to thoroughly analyze the experi- 
ment for systematic errors. Such misuse of the statistical methods has 
led some scientists to discard all statistical treatments. This is very un- 
fortunate because properly used statistical methods can improve the 
analysis of any physical experiment. 


1-2 RECORDING 

The remainder of this chapter will be used for some introductory material 
to proper laboratory operations. 

Formal data sheets or notebooks should be used to record all observa- 
tions, numerical or otherwise. It should list important instruments used. 
Under no circumstances should observations be recorded elsewhere and 
none should be erased. The data sheet is the “notebook of the scientist,” 
which, like that of an accountant, should be a book of “original entry” 
for reasons that are as important on the one hand for scientific investigation 
of natural phenomena as, on the other, for legal investigation of indebted- 
ness. If there is reason for doubting the value of any entry, or even if it is 
obviously wrong, it may be canceled by drawing a line through it ; but this 
should be done in such a way as not to obscure what is written but to 
permit it to be utilized later if it is found desirable to do so. 

The advantages of keeping full, complete records cannot be too strongly 
emphasized. We might almost say that one should write it down every 
time he turns around. It is true that most entries of this kind will never 
be referred to again, but that is sufficiently compensated for by finding a 
much desired record of some odd little fact, many months after the event, 
that a less methodical person would not have bothered to note down. 


1-3 ESTIMATION OF TENTHS 

It sometimes happens that a measurement requires only a slight degree of 
accuracy, and time and trouble can be saved by making it only roughly. 
As a general principle, however, it is advantageous to make all measure- 
ments as accurately as possible. Thus measurements of lengths made with 
the meter stick should be expressed not merely to the nearest millimeter 
but to the nearest tenth of a millimeter. It is not difficult to imagine each 
of the millimeter intervals of the scale divided into tenths and to make a 
fairly good estimate of the number of tenths included in the length to be 
measured. When experienced observers make mental subdivisions into 



1 - 3 ] 


ESTIMATION OF TENTHS 


5 


tenths of the smallest intervals on a graduated scale their estimates seldom 
differ from one another but for the beginner the estimation of tenths may 
be an uncertain process. To gain proficiency it is better to begin with larger 
subdivisions, such as a scale of centimeters that has no millimeter marks. 
The position of a mark placed at random on such a scale can be estimated 
mentally and the accuracy of such a determination tested by actual 
measurement with a more finely divided scale. 



CHAPTER 

2 


ACCURACY 


It has become reasonably common practice to distinguish between precision 
and accuracy. This convention will be adopted in this book. 

2-1 PRECISION VS. ACCURACY 

The word “precision” will be related to the random error distribution 
associated with a particular experiment or even with a particular type of 
experiment. The word “accuracy” shall be related to the existence of 
systematic errors — differences between laboratories, for instance. For 
example, one could perform very precise but inaccurate timing with a 
high-quality pendulum clock that had the pendulum set at not quite the 
right length. 

As another example, two different laboratories could make comparisons 
of masses for which each uses a beam balance of the same model made by 
the same manufacturer. The precision at the laboratories would very 
probably be the same, as nearly as one could tell, but it is also probable 
that there would be a difference in the ratio of the arms on the two balances 
that could be demonstrated by proper procedures. One of these balances 
would certainly have a ratio nearer unity than the other, and unless 
appropriate corrections are made on the latter, the results obtained with 
it would be less accurate. 

Finally, it should be emphasized that references to precision or accuracy 
are pertinent to the absolute size of the error involved and not to the 
fractional error. Confusion on this point leads to confusion in the concept 
of the weight of an observation; i.e., that some observations should be 
given greater consideration than others. Thus a pressure of 0.5 atmo- 
sphere (atm) known to 0.1 atm is known more precisely than is a pressure 
of 10 atm known to 0.5 atm. On the other hand, the logarithm of the 
second pressure is more precise than is the logarithm of the first. 

6 
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Many of the remarks to be made apply equally to precision and accuracy. 
Where this is the case, no attempt to distinguish between them will be 
made. 


2-2 IMPOSSIBILITY OF DETERMINING TRUE VALUES 

Beginning students in technical courses have some difficulty in grasping 
the idea that accuracy is always a relative matter and that absolute preci- 
sion of measurement is an impossibility, because they have had little 
practice in careful measurement and their previous study of arithmetic has 
emphasized infinite accuracy in numerical values. Such a number as 12.8 
has been supposed not only to mean the same thing as 12.80 but also to be 
equal to 12.800000 . . . , to an unlimited number of decimal places. This 
is quite proper and satisfactory so long as one realizes that he is dealing 
with ideal quantities: perfections of measurement which have no more 
reality of existence than the point, line, plane, or cube of geometry. The 
smoothest surface of a table does not come as near to being a plane as does 
the surface of an “optically worked ” block of glass or a “Whitworth plane, ” 
and even the smoothest possible surface can be magnified to show that it 
contains irregularities everywhere. If it were magnified enough, we could 
see that its shape would not even remain constant, but individual mole- 
cules would be found swinging back and forth or escaping from the surface. 
A geometrical plane certainly corresponds to nothing in reality and, in 
general, perfect accuracy of number is just as much an imaginary concept. 


2-3 DECIMAL ACCURACY 

If 12.8 cm, as a measurement, does not mean the ideal number 
12.800000000 ... to an infinite number of decimal places, what does it 
mean? Since different measurements are likely to be made with different 
degrees of accuracy, the universally adopted convention is merely the 
common-sense one that the statement of a measurement must be accurate 
as far as it goes; and it should go far enough to express the accuracy of the 
determination. Thus “12.8 cm” means a length that is nearer to precisely 
12.80 . . . than to precisely 12.70 or 12.90 cm; i.e., that its “rounded-off” 
value would be 12.8 cm, not 12.7 or 12.9. If a length is written “12.80 cm, ” 
however, the implication is that the stated measurement is nearer to this 
same precise 12.800000 . . . than it is to either 12.79 or 12.81 cm. In other 
words, it is implied that it has been measured to hundredths of a centimeter 
and found to be between 12.795 and 12.805 cm, so that it can properly be 
rounded off to 12.80. The previous description, “12.8 cm, ” means that the 
measurement is between 12.75 and 12.85; it says nothing about hundredths 
of a centimeter, and can correctly represent any length between the limits 
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just given, for example 12.75, 12.76, 12.77, 12.78, 12.79, 12.81,12.82, 12.83, 
or 12.84,* for each one of these could be rounded off to 12.8. To write the 
length 12.8 cm in the form “12.80 cm” would be to violate the rule that a 
statement should be accurate as far as it goes, for it would go as far as 
hundredths, and the chances are ten to one that it would be one of the 
other numbers of hundredths given above. On the other hand, if an ob- 
server determined a length to be 12.80 cm, that is, if he looked for 
hundredths and established the fact that there were none of them, then to 
state the result only as 12.8 cm would not be doing justice to his own 
measurement, for he would imply that the correct number of tenths was 
merely known to be nearer 8 than 7, namely greater than 7.5, whereas 
he had actually found it to be nearer 8 than 7.9 — that is, greater than 
7.95. 

When a carpenter says “just 8 in.” he probably means “nearer to 8 and 
§ than to 8 and ^ or 7 and in., ” yV in. one way of the other usually being 
unimportant. When a modern machinist says “just 8 in.,” he may mean 
“nearer to 8.000 than to 7.999 or to 8.001,” yuwo in- usually being negligible 
to him. When another person says “just 8 in.,” we must know what kinds 
of material he works with before we can tell the meaning of his word 
“just.” If decimal subdivisions were used everywhere, the carpenter’s 
8 in. would mean 8.0, while the machinist’s would mean 8.000; for one 
man “8” would mean “between 7.950 and 8.050,” while for the other it 
would mean “between 7.9995 and 8.0005. ” It is to avoid such ambiguities 
that scientists have adopted the rule that “8” means “between 7.5 and 
8.5;” “8.0” means “between 7.95 and 8.05,” and “8.00” means “between 
7.995 and 8.005.” In other words, no more figures should be written down 
than are thought to be correct, and no figures that are thought to be correct 
should be omitted. In experimental measurements the last figure or even 
the last two figures of an observation may be uncertain. These uncertain 
figures should be written down anyway, since the average of a number of 
such observations is more certain than any single one. The safest rule is to 
retain one or at most two doubtful figures in an observation and only one 
doubtful figure in a final result of the experiment. There is not much danger 
of anyone’s “rounding off” a carefully obtained measurement like 2.836 g 
to 2.84 g merely for the sake of doing some rounding off. There is a very 
decided danger, however, of forgetting to write down a final significant 
zero; if two lengths are 147 mm and 160 mm the tendency when writing 
them in centimeters is to put down 14.7 and 16. If the 0 is as important as 
the 7 when writing millimeters, the same is equally true when writing 
centimeters. 


* For a discussion of the convention by which 12.75 < 12.8 < 12.84, see 
Section 2-7. 
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2-4 RELATIVE ACCURACY 

Draw two rectangles of dimensions 20.0 cm X 20.5 cm and 2.0 cm X 
2.5 cm, which are approximate squares. 

The difference between the height and the width of the larger one is 
just the same as the difference between the height and the width of the 
smaller, yet the small rectangle is obviously a less accurate approximation 
to the shape of a perfect square than is the large one. This may serve as an 
illustration of the fact that the main interest may sometimes be in the 
relative accuracy rather than in the absolute accuracy. A difference of 
0.5 cm has the same absolute value wherever it occurs, but it is a con- 
siderable part of a 2-cm length while it is relatively insignificant in com- 
parison with a 20-cm length. 

Accordingly the relative accuracy of a measurement depends on two 
things: the absolute difference and the size of the measurement itself. 
If two points on the earth’s surface are found by careful survey to be 10 
miles (mi) apart, the determination of distance may easily be in error by 
more than 1 ft, and even with the most careful triangulation the error is 
likely to be as much as 4 in. However, an error of 5 in. in measuring the 
thickness of a door could hardly be made even with the clumsiest measur- 
ing device. Nevertheless, it would be misleading to say that the “clumsy” 
measurement should be considered necessarily less accurate than the 
careful one; 5 in. is less than 4 in., and one’s attitude should reflect the way 
the results are to be used. It is true that it is often important to consider 
what fraction of the total measurement the error amounts to. Suppose the 
thickness of the door is 1^ in. How large a part of this measurement is 
the error of 5 in. ? Obviously it is one-sixth of the total, or an error of more 
than 16%, while 4 in. out of 10 mi is roughly an error of 1 out of 150,000, 
or about 0.0006 of 1%. 

The relative error of a measurement usually does not need to be calcu- 
lated with great care. Where numbers are as different as 0.0006% (10-mi 
survey above) and 16% (thickness of door) the location of the decimal 
point is more important than the size of the significant figures in either 
case; to call the former number “a few ten-thousandths of one percent” 
and the latter “some 10 or 20 percent” gives the important information 
needed. This means that calculations of relative error seldom need to be 
done on paper but can be worked out mentally. 

Frequently in reporting results it is desirable to state the percentage of 
difference between two numbers, and an ambiguity arises as to which of 
the two numbers should be divided into their difference. The following 
rules are generally agreed upon. 

The relative difference between 3.11 and 7r (=3.14) is 374, and the 
relative difference between 3.17 and 7 r is likewise 3^4, not gfy; i.e., the 
numerical error is to be divided by a true or theoretical value rather than 
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by an experimental or erroneous value. It is often desirable, however, to 
compare two values which are equally good according to one’s available 
knowledge of them. When there is no standard and no reason for choosing 
one of the measurements over the other, the accepted procedure is to 
divide the difference by the greater value. For example, the numbers 4 and 
5 would be said to differ from each other by 20%, not 25%, for the dif- 
ference divided by the greater number is one-fifth, not one-fourth. 

2-5 TYPES OF NUMBERS ENCOUNTERED 

In technical and scientific literature the numbers encountered can usually 
be recognized to be of three different types with respect to accuracy: 

1. Numbers obtained from experimental data, which were discussed in 
the preceding paragraphs, especially Section 2-3. They are written with 
as many digits as are justified by the accuracy of their determination. 

2. Exact numbers, such as the ^ in the kinetic-energy formula, \ mv 2 , 
or as the 4 in the formula for the area of a sphere, 47rr 2 . Such numbers are 
always written as shown here but are treated in calculations like 0.50000 . . . 
and 4.00000 . . . , respectively, each with as many decimal places as are 
required for the particular calculation involved. The number 7 r also falls 
in this classification. Exact numbers are easily recognized so that writing 
them as 4, and 7 r causes no confusion and avoids the awkwardness that 
would result from writing them as 0.50000 . . . , 4.00000 . . . , and 3.14159 . . . 

3. Illustrative numbers such as those found in textbook problems. 
These numbers generally come from imaginary experiments and are 
usually chosen as simple integers to simplify the calculations. For example, 
“A sled slides 15 ft from rest in 3 sec. What is its acceleration?” or “The 
legs of a right triangle are 2 and 3 ft long. What is the length of the hypot- 
enuse?” From the answers given for such problems, one usually finds 
that the author desires the student to assume that these numbers are 
respectively 15.0 ft and 3.00 sec, and 2.00 ft and 3.00 ft. In cases like this 
one should treat the numbers given as if they were of reasonable accuracy — 
usually an accuracy that is required to give three significant figures in the 
answer if no other information is available, or an accuracy that is involved 
in the careful use of a ten-inch slide rule. The use of this kind of number in 
illustrative problems is justified by the fact that it simplifies calculations 
and thus makes the learning of fundamental principles more important 
than doing arithmetic. 

2-6 SIGNIFICANT FIGURES 

At first glance, one might think the discussions in this section and in 
many succeeding sections outmoded for those who have access to auto- 
matic digital computers. Personal experience, however, has shown that 
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Table 2-1 


Number of significant figures 

2 

3 

4 

Numbers having the 

23. 

23.2 

23.20 

indicated number of 

0.43 

0.432 

0.4321 

significant figures 

0.0078 

63 X 10 3 

0.00780 

63.0 X 10 3 

0.007806 
63.04 X 10 3 


this is not the case. The material in this book makes a good foundation 
on which to build the more advanced methods of numerical analysis needed 
for proper use of the larger machines and they can also be applied im- 
mediately to the use of the smaller computers and desk calculators available 
to almost every scientist and engineer. 

When one wishes to determine the accuracy to be expected from a 
calculation in which experimental quantities are involved, the concept of 
significant figures is simple and very useful. This concept is further helpful 
in eliminating much useless calculation that might be thought necessary 
otherwise. 

The concept is best explained by examples. Each entry in Table 2-1 
has the number of significant figures indicated at the top of the column. 

A simple rule for significant figures in division, multiplication, raising 
to powers,* and extracting roots is as follows: Retain the same number 
of significant figures or at most one more in the answer as is contained in the 
component quantity of the least relative accuracy, i.e., that has the smallest 
number of significant figures. In general, time will be wasted in calculations 
of this kind if any quantity used in these calculations contains a number of 
significant figures greater than one more than the number of significant 
figures contained in the least accurate quantity in the calculation. This 
rule is useful because in the operations mentioned the answer will have 
approximately the same percentage of “uncertainty” as the quantity in 
the calculation that has the largest percentage of “uncertainty,” and 
because the number of significant figures in a quantity is a rough measure 
of its percentage of accuracy. 

The rule suggests retaining one extra significant figure as a safety 
measure, since the significant-figure concept is not accurate enough for 
all our purposes. This will be seen from the following examples: Divide 
99.8 by 9.94. Considering the “uncertainty” in a number to be one unit 
in the last significant figure, the number 99.8 should be between 99.75 
and 99.85, which is an uncertainty of 0.1. The result, then, should be 10.04, 

* The simple rules given here for significant figures apply less accurately to the 
operation of raising to powers than they do to the operations of multiplication 
and division. This follows from the fact that if a quantity is raised to a power n 
the actual percentage of uncertainty in the result is n times as great as that of 
the original quantity. 
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not 10.0, since the uncertainties in 99.8, 9.94, and 10.04 are all about 0.1%, 
whereas the uncertainty in 10.0 is 1%. As another example, note that 11.1, 
23.2, 45.1, 72.8, and 98.2 all have three significant figures, but their un- 
certainties are approximately 1, 0.4, 0.2, 0.13, and 0.1% respectively. This 
last example shows that for quantities which all have the same number of 
significant figures, those quantities having the smallest first digit have the 
largest percentage of uncertainty. 

Numbers that by these rules are found to have too many significant 
figures should be rounded off until they contain the correct number of 
figures. If the number being rounded off has a smaller first digit than the 
least accurate number, it should be left with one more significant figure 
than the least accurate number. 

The rules just given for the use of significant figures do not apply to 
addition and subtraction. When numbers are to be added or subtracted, 
they should be rounded off until they contain no more decimal places than 
that number in the group that contains the smallest number of decimal places. 
For example, if 2.823 is to be added to 586.3, it should be written 2.8 and 
the sum written as 589.1. The sum is not 589.123, since such a result would 
imply that 586.3 means 586.300, whereas it means only some number be- 
tween 586.25 and 586.35. 

When calculations must be made with complex formulas, care must be 
taken not to round off the figures too much or too soon. One must be 
especially careful when any quantity appears in more than one place in the 
complex formula. The following problem found in a number of elementary 
texts illustrates the principle. 

A simple pendulum has a period of 2 sec. If an increase in temperature 
makes the pendulum 1 mm longer, how many seconds would it lose in a 
day? 

If the statement that the period is 2 sec means that the period is some- 
where between 1.5 and 2.5 sec, then the problem has no meaningful solu- 
tion, since the uncertainty will be so much greater than the change pro- 
duced by the increased length. Obviously the author did not state his 
problem accurately according to the principles set forth here and, therefore, 
the student must make assumptions as to what the author meant. It 
appears reasonable to assume that he meant the original pendulum would 
make just 86400/2.00000 = 43200 complete swings per day, and therefore, 
the period must be accurate to five or six significant figures. 

The period of one complete swing is given by 

T = 2ir\/Ljg. 


L i 


g(2.00000) 2 

47T 2 


9 




980.16 

9.8696 


99.311 cm. 


Hence 
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Assume that the increase in length is just 1.00 mm; then L 2 = 99.411 cm 
and T 2 = 27T\/99.41 1/980. 16 = 2.00101 sec. The lengthened pendulum 
will lose (2.00101 — 2.00000) 43200 = 43.6 sec per day. 

It is obvious that one must carry five or six significant figures in these 
calculations to obtain two or three significant figures in the answer. In 
cases of this kind it is usually possible to derive an approximate expression 
that can be calculated with fewer significant figures and at the same time 
retain the same accuracy. This will be illustrated here; general methods 
will be discussed in the next chapter. 

In the present case, the equation 

T\ = 2ir\/Li/g 

can be divided into the equation T 2 — 2T\/L^/g giving T 2 /T x — 
VLJT X . Set 


L 2 — L x + l, 


T 2 = Tt 





T ill + 


1 

2 L x 



This last expression is an expansion by the binomial theorem in which 
all terms from the third on in the parentheses are very small. Hence 


or 


T 2 S 




Ti 1 + 



> 



Now T i = 2.00000 sec, l = 0.100 cm, and L x = 99.311 cm. A simple 
slide rule calculation shows that (T 2 — 7\)43200 = 43.6 sec. 


2-7 BOUNDING OFF NUMBERS 

When a number is rounded off the last digit retained should be increased 
by one whenever the adjacent digit being discarded is greater than 5, 
or if it is 5 and there are digits other than zero to its right. If the discarded 
digit is just 5, and there are no known digits to its right, or there are only 
zeros, the retained digit should be increased by one, if such increase makes 
it an even number. If the retained digit, in this case, is already even, do 
not alter it. This last rule is purely arbitrary, but it does reduce the 
danger of cumulative errors that might be introduced by a habit of always 
increasing or never increasing the last retained digit by one when the dis- 
carded digit is just 5. Of course, this same result would be produced by 
substituting “odd number” for “even number” in the rule, but the use of 
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“even number” possesses the slight advantage of avoiding the necessity 
of making the decision a second time if the number is later to be divided 
by two. 

For illustration, the following numbers have been properly rounded off 
to three significant figures: 


12.346 

12.3 

12.350 

12.4 

12.250 

12.2 

12.251 

12.3 

12.351 

12.4 

12.349 

12.3 


There are times, however, when numbers should not be rounded off 
while making the calculations. In solving simultaneous equations it is 
necessary to assume that the numerical coefficients are known precisely, 
just like the \ in the expression \mv 2 . Thus during the solution, numbers 
must be used that have many more digits than those in the coefficients. 
The reason can be seen by setting up for such a problem a solution by 
determinants. We see that differences between numbers which are very 
close to each other are often needed. These numbers are products of two 
of the pieces of input information and it can turn out that they are identical, 
or nearly so, to the percentage of accuracy, or number of significant figures, 
in one of the pieces of input information. Were these intermediate numbers 
rounded off before taking the difference, this difference could become zero. 
Usually the difference between the unrounded numbers will be closer to 
the input information in numbers of significant figures. The number of 
figures kept in the results should also conform to the input information. 

2—8 LARGE AND SMALL NUMBERS 

To avoid a long string of figures when writing very large or very small 
numbers it is customary to divide a number into two factors, one of them 
being a power of ten. Thus 0.000562 and 23,600,000 are the same as 
562 X 10~ 6 and 23.6 X 10 6 respectively. This notation also makes it 
possible to write 93,000,000 unequivocally with whatever number of signi- 
ficant figures is desired; it can be put in the form 93 X 10 6 or 93.0 X 10 6 or 
93.00 X 10 6 , etc. The same value and accuracy for 93.00 X 10 6 would 
be retained just as well by writing 9.300 X 10 7 or 930.0 X 10 5 , but 
the Committee of the American Physics Teachers that studied this matter 
recommended that, wherever feasible, the power of ten chosen should be 
three or a multiple of three. If all numbers are written in this way, com- 
parison of different quantities will be greatly facilitated. 
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1. Suppose you are instructed to draw a line “just ten” centimeters long, using 
an ordinary centimeter scale. Will its length be best expressed as 10 cm, 
10.0 cm, 10.00 cm, or 10.000 cm? 

Answer: 10.0 cm 

2. The density of water in g/cc is, for the temperature given: 

0°C 0.999841 

4°C 0.999973 

10°C 0.999700 

20°C 0.998203 

30°C 0.995646 

(a) Does water at 4°C have a density of 1.00, 1.000, 1.0000, or 1.00000? 

(b) Correct the following statement by crossing off only the unjustifiable 

figures: “At ordinary room temperatures water has a density of 1.00000.” 
Answer: (a) 1.0000 (b) 1.00 

3. The number tt is 3.14159265358 . . . Find the percent error in the approxi- 
mations: (a) ~y", (b) for 7T, (c) 10 for 7 r 2 . As described in Item 2 of 
Section 2-5, 22, 7, 355, etc. are assumed to be exact. 

Answer: (a) 0.040% (b) 8.5 X 10 _6 % (c) 1.32% 

4. The number e is the limit of the sum of the infinite series 


1 + n + l[ + ^ + ¥! + 


which is 2.7182818 . . . What percent error is made by taking (a) the first 
three terms, (b) the first four terms? 

Answer: (a) 8.0% (b) 1.90% 
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Suppose that a ruler graduated in centimeters and millimeters is used to 
measure the side of a square, and by estimating tenths of a millimeter the 
length is found to be 2.87 cm. The mathematical square of this quantity 
is 8.2369 cm 2 , but we have already seen that only three figures of this 
area can be trusted, because we know nothing about the fourth figure of 
the measurement from which it is derived. Accordingly, the square is 
said to have an area of 8.24 cm 2 . Similarly, if one side of a square measures 
1.03 cm, the measurement being correct to tenths of a millimeter but 
nothing being known about hundredths of a millimeter, then its area will 
be correctly expressed by the quantity 1.06 cm 2 and not 1.0609 cm 2 , and 
the volume of a cube that has this square for one of its sides will be 1.09 cm 3 
rather than 1.092727 cm 3 . It should be noted that: (a) with an ordinary 
ruler it is impossible to measure a length of a few centimeters with an 
accuracy greater than is expressed by three significant figures; (b) the area 
or volume calculated from such data cannot be trusted further than its 
third significant figure; (c) the example just given suggests a remarkably 
simplified process of calculation where some quantity to be squared or 
cubed is a little greater than or a little less than unity. Thus, (1 + 5) 2 = 
1 + 25, and (1 + 5) 3 = 1 + 35, provided that 5 <<£ 1. 


3-1 NEGLIGIBLE MAGNITUDES 

The justification of the simplified process of raising a number which is 
approximately unity to a power will be made more evident perhaps by 
the following example. 

Suppose that a metal cube has been constructed accurately enough to 
measure 1.00000 cm along each edge. If it should be brought from a 
cold room into a warm room, a delicate measuring instrument might 
show that the change of temperature had increased each dimension to 

16 



3-1] 


NEGLIGIBLE MAGNITUDES 


17 


1 .00012 cm. By unabridged multiplication the area of each side would be 
1.0002400144 cm 2 and the volume 1.000360043201728 cm 3 . If the most 
careful measurements made it just possible to distinguish units in the fifth 
decimal place, then tenths of those units, represented by the sixth decimal 
place, would be impossible to measure, and the attempt to state not only 
tenths but hundredths and thousandths of those units would be absurd. 
By noting that the number 1.0002400144 differs from the value obtained 
by abridged multiplication, 1.00024, by only a few thousandths of the 
smallest measurable amount, we can see clearly why the area of a 
1.00012 cm square must be 1.00024 cm 2 . Similarly, the volume of the 
cube is neither more nor less than 1.00036 cm 3 , and the string of figures 
running out ten more decimal places is absolutely meaningless. 

The examples given above suggest that if x is small compared to one, 
(1 + x) 2 = 1 + 2x, (1 + x) 3 = 1 + 3x, and in general, 

(1 + x) n = 1 + nx. 

That is, by the use of the binomial theorem, we find that 

(1 + x) n = 1 + nx + rrn(n — \)x 2 + • • • , 

and if x is so small that n 2 x 2 , n 3 x s , etc., are negligible compared to nx, 
we can write 

(1 + x) n = 1 + nx. 

The third term, %n(n — l)x 2 , in the binomial expansion is usually suf- 
ficient for determining the inaccuracy of the approximation. When, as 
often happens, the order of magnitude of the inaccuracy is all that is 
desired, a mental calculation of x 2 will usually suffice. 

This approximation is so useful that it should be thoroughly memorized 
and applied wherever possible. 

An extension of the above approximation is also useful. If a, b, and c 
are small compared to unity and if l, m, and n are not too large (e.g., 
less than 4 or 5), then 

(1 + a) z (l + b) m { 1 + c) n = 1 + la + mb + nc. 

The following special cases of these formulas are often useful: 

1 a, y / 1 -p a = 1 -|- > 

l + a-b, VaT+H = Ayjl + ~ = A + ^- 

The reader will probably meet many others. 


1 

1 + a 

1 -j- CL 

T+~b 
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3-2 APPROXIMATIONS WITH DIFFERENTIAL 
CALCULUS 

Differential calculus provides us with an important method of approxima- 
tion. It can best be explained by means of an example. Suppose we wish 
to find the approximate value of sin 31° without the aid of a table. Since 
sin 30° is 0.5, we know that sin 31° will be close to 0.5, and this might be 
called our first approximation. Let us assume that this approximation is 
not sufficiently accurate for our purposes. To obtain a better value, we 
need to know how rapidly the value of the sine function changes with 
respect to the angle itself so that we can calculate how much the value of 
the sine changes as the angle changes from 30° to 31°. When angles are 
measured in radians the change of the sine per unit change in the angle, 
i.e., the rate of change of the sine, is given by d(sinx)/dx = cos a:; the 
rate of change of the sine is equal to the cosine. If we multiply the rate 
of change of the sine per unit change in the angle by the amount of change 
of the angle, we obtain approximately the total change in the sine. Thus, 
as the angle changes from 30° to 31° the sine of the angle changes by 

cos 30° X 1° (in radians) = ^ X ^ = 0.01511. 

Hence sin 31° = 0.5 + 0.01511 = 0.51511. From the tables we find that 
sin 31° = 0.51504, which shows us that this second approximation is 
accurate to 0.014%. The small inaccuracy arises from the fact that our 
assumption that the rate of change of the sine is constant over the 1° 
interval is not exact. 

In general, if y = f(x) and the value y\ of y is known for a given value 
Xi of x, then the value of y 2 at x\ + Ax is given by 



This equation is obtained from Taylor’s infinite series by dropping the 
terms with powers of Ax greater than one; these terms are negligible if 
Az is small and if the function y is reasonably well behaved. While Taylor’s 
series is treated in most calculus texts, the result for y 2 can be understood 
without an understanding of that theorem. One need only understand 
that ( dy/dx)i represents the rate at which y increases as x increases at the 
particular value of x represented by the subscript 1. When the rate of in- 
crease of y with respect to x is multiplied by the change Ax in x, the result 
is the change in y. Thus y 2 could be written, and often is, as y i + Ay and 
Eq. (3-1) as 
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Helpful reference can be made to Fig. 3-1, which shows a case where the 
expression for Ay is not sufficiently accurate. The term in Taylor’s series 
which follows (dy/dx) iAx is %(d 2 y/dx 2 ) x (Ax) 2 , which is so large for the 
function of Fig. 3-1 that an accurate value of Ay for the Ax shown cannot 
be obtained if the term in question is omitted. On the other hand, it can 
be seen that the accuracy of the simpler result would be greater for smaller 
values of Ax. 


As an example, we may want to obtain 
the value of logiol03, knowing that 
logiolOO = 2. We let 

y = logio x = (log io e)(log e x), 

dy _ logio g _ 0-4343 
dx x x 

= 0.004343. 

100 

Substitution in Eq. (3-1) gives 

logi 0 103 = 2 + (0.004343) X 3 
= 2.0130, 

which is in error by only 2 in the last fig- 


dy 

dx 



FIGURE 3-1 


ure given. 

It is easily seen that this approximation can be used only where the 
curve represented by f(x) is continuous and its derivatives are continuous. 
For example, it would be inconceivable to attempt to approximate the 
value of tan 91° from the value of tan 89°. 


Finally, it should be mentioned that Taylor’s series can be applied to 
functions of more than one variable. It will be generally assumed in this 
book that the requirements of the problem will always be met by using 
only the terms containing the first power of the increments of the variables. 
Thus if y is a function / of a set of variables x x , x 2 , . . . , X{, and some 
particular set of values of these is x[, x 2) . . . , x[, then the increment 
Ay in y obtained when there are increments Ax lf Ax 2 , . . . , Ax{ in the 
variables away from the primed set, is 


Ay - (CX A *‘ + (CL + 


(± 1 ) 

\dXiJ x ' 


Here the subscript x' means that the derivatives are to be evaluated at the 
primed set of values of the variables. The sign d indicates that the deriva- 
tive is to be taken only with respect to the one variable; it is clear, however, 
that the numerical value of such a 'partial derivative in general will depend 
on the values of all the variables. 
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1. Find approximate values of 

(1.066)" 

and (0.9988)" for n = 2, —2, — 

and £. 

Answer: 

n 

(1.066)" 

(0.9988)" 


2 

1.132 

0.9976 


1 

1.033 

0.9994 


-2 

0.868 

1.0024 


1 

0.967 

1.0006 


1 

1.022 

0.9996 

2. Use approximation methods 

to find: 

(a) the square root of 50 to four signi- 


ficant figures, (b) (1.00036)/ (1.00364). 

Answer: (a) 7.071 (b) 0.99673 

3. When a ballistic pendulum is struck, the height h to which it rises is related 
to the horizontal distance d through which it moves by the equation 

h = R - VR 2 - d 2 

with sufficient accuracy if the length of the pendulum, R, is very large com- 
pared with d. Take R and d to be exactly 292 cm and 15 cm, respectively. 

(a) Find h by using an approximation to the square root. 

(b) If the equation were to be solved exactly, how many significant figures 
must be retained to get the same result that one can get with a slide rule 
by the method of part (a) ? 

Answer: (a) 0.385 cm (b) six 

4. The ratio of the length of the left arm of a chemical balance to that of the 
right arm is VW1/W2, where Wi is the apparent mass of a body when it is 
placed in the left-hand pan and W2 is the apparent mass when it is placed 
in the right-hand pan. By letting W 1 — W 2 + 0, find an approximate 
expression for this ratio. 

(a) Evaluate the ratio for W\ = 24.5028 g and W 2 = 24.5002 g. 

(b) Supposing W 1 and IF 2 in part (a) to be exact, what error is introduced 

by use of this approximation? Will this error have any significant effect? 
Answer: (a) 1.000053 (b) 5.9 X 10 -6 % 

5. Given that 

y{a) = - e~ Va2 and y( 2) = 0.38940. 

< 7 

(a) By using Eq. (3-1) find y( 2.01). 

(b) The third term in Taylor’s series, written to conform to Eq. (3-1), is 
\{d 2 y /dx 2 )\ (Ax) 2 . How large would Acr be in order that this term affect 
the result? 

Answer: (a) 0.38843 (b) Acr ~ 0.02 yields a third term ~5 X 10 -6 



CHAPTER 

4 


GRAPHIC ANALYSIS 


We shall begin in this chapter the direct approach to the analysis of experi- 
mental data. The chapter is titled Graphic Analysis and is indeed the 
proper first approach. Nevertheless, we wish to begin with a word of 
admonition. 

It is unfortunately true that one finds too often in the scientific literature 
such phrases as, “The data appear to fit the equation so and so . . . ,” 
or, “The estimated probable error is ... ” Such statements mean that the 
data have been “analyzed” by methods that are not describable precisely. 
It may well be true that the data being described do not follow exactly 
the normal curve of error, which is the foundation of most computational 
methods in common use, but even so, the application of the normal error 
curve via actual computation can be described precisely, and the numerical 
results obtained could be reproduced by anyone. 

On the other hand, familiarity with methods of graphical analysis gives 
one a “feeling” for such mathematical concepts as continuity, derivative, 
approach to an asymptote, and so on. It gives one a better understanding 
of the meaning of residual, which we shall define later, and of the difference 
between the residual and the hypothetical true error. The methods of 
graphical analysis are an invaluable aid in the understanding of more 
sophisticated computational methods. 

4-1 CURVE PLOTTING 

It is usually desirable to plot all experimental data on coordinate paper 
and to draw smooth curves which fit the plotted points as closely as 
possible. This is desirable not only because the curves so plotted convey a 
clearer picture to the mind of the results of the experiment than can the 
data themselves, but also because curves so plotted can frequently be 
used to give certain accurate information which cannot be obtained other- 
wise. The shapes of such curves may serve to verify existing laws or may 
suggest laws which were not previously known. Smooth curves should 
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be drawn through the plotted points because our past experience tells us 
that physical changes are almost never abrupt. In fact, changes which 
at first seem abrupt or discontinuous are usually found to be continuous 
on closer examination. If the surface of a solid standing in air is examined 
closely by modern methods, a transition region will be found where sub- 
stance of the nature of air gradually changes into substance of the nature 
of the solid. 

Occasionally one may miss certain rapid changes by the process of 
"smoothing,” in cases where just one or two points lie off the smooth 
curve. Regnault, the great French physicist, thought that the vapor 
pressure curve of ice was continuous with that of water and made one 
smooth curve link up all the readings. It is now known that at 0°C the 
two curves should intersect at slightly different slopes. An investigation 
of Regnault’s diagram shows that his data had actually indicated this 
fact, but he smoothed it out. If at any time one finds a point which does 
not lie on a smooth curve and suspects that it is not just accidental, he 
should make further experiments in the neighborhood of that point to 
better determine the shape of the curve. If further experiments cannot 
be performed, he is not justified in drawing anything but a smooth curve. 

The usefulness of the plotted curve depends to a great extent on the 
choice of scales used in plotting. Three rules which should be followed 
in the choice of scales are : (a) choose scales that make plotting and reading 
easy, e.g., put round numbers on heavy lines of the coordinate paper; 
(b) choose scales so that the curve will nearly cover the whole sheet of 
graph paper; (c) choose scales such that the accuracy of plotting the 
points is nearly equal to the accuracy of the data plotted. If, for example, 
plotting can be done much more accurately than is justified by the accuracy 
of the data, the points will be unduly scattered and make it difficult to 
judge the shape of the curve. It is obvious that it will often be impossible 
to follow all three of these rules at the same time, in which case a com- 
promise will be necessary. Rule (a) is the most important of the three 
because “odd” scales invariably lead to errors when points are read off 
the curves. Whether one should give more weight to rule (b) or to rule (c) 
will depend somewhat on the purposes of the plot, convenience in use, 
and the nature of the data. For instance, one often has the case of a 
dependent variable, y say, governed by the values of two independent 
variables, x and T ; and sometimes a plot of y vs. x for a single value of T 
might well cover a conveniently sized graph paper so that the plot for a 
second value of T would not fit on the same sheet of paper. Thus if one 
wishes to show, on the same plot, y vs. x for several values of T, he must 
abandon rule (c) in favor of rule (b). On the other hand, the larger scaled 
plot might be necessary to show the true nature of the dependence of y 
on x at a single value of T . 
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4-2 CURVE FITTING 

A curve showing the relationship between the variables measured is very 
useful. It gives a clear idea of how a variation of one quantity affects the 
other. If one can find a mathematical equation that fits this curve, much 
more information can be gained. 

For example, suppose a calorimeter cup of known heat capacity is 
filled with a measured quantity of hot water and a record of its temperature 
at one-minute intervals is taken. If the temperature is plotted as a function 
of time, the smooth curve drawn through these points will show that the 
temperature drops rapidly at first and more slowly as the temperature 
approaches that of the room. Thus one may infer from the curve that the 
cup and water lose heat most rapidly when their temperature is high above 
that of the surroundings. 

However, more information can be obtained if a mathematical relation- 
ship can be found connecting the temperature with time. By methods that 
will shortly be described, one should be able to show that the equation 

e - d 0 e~ bt 

will fit this experimental curve where 6 is the temperature difference 
between the calorimeter and its surroundings, 0 O is the value of 6 at time 
t = 0, and b is a constant which depends on the apparatus. 

The fitting of the equation 


6 - d 0 e~ bt 

to the curve will provide numerical values for b and 6 0 . By differentiating 
the equation one finds that 

^ == -d 0 be~ bt = -be. 
at 

This equation tells us that the rate at which the temperature difference 
drops is proportional to that temperature difference with b as the constant 
of proportionality. Furthermore, the rate at which heat is lost from the 
calorimeter is given by 


where h is the known heat capacity of the calorimeter and the water. 
Therefore, bh gives the number of calories of heat given off by the calorim- 
eter per second per degree difference in temperature. Thus a great deal 
of useful information can be obtained from an experimental curve, 
especially if a mathematical equation can be fitted to that curve. We shall 
now take up methods for fitting equations to curves. 
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4-3 STRAIGHT - LINE LAW 

It is very important to know as much as possible about the fitting of 
straight lines to experimental points. This is true not only because many 
experiments yield data that naturally follow straight lines, but also 
because nearly all data can be plotted in some manner so that they will 
follow a straight line. The straight-line plot yields so much information 
so easily that it is standard practice wherever possible to arrange the data 
so that they can be fitted in this way. The details of transforming data so 
that they follow a straight line will be discussed in the next two sections. 
The treatment of data that naturally follow a straight line must be dis- 
cussed first. 

The equation y = a + bx has a straight line for its "curve,” and this 
line cuts the y - axis at a height a, and has a slope that is numerically equal 
to b. Computational methods will be described later by which the best, 
or most probable, values of a and b can be determined. These methods 
can be extended, in principle and frequently in practice, to functions which 
are much more complicated than the linear relationship between two 
variables. Even the most tedious of such problems are easily handled 
once the programming has been done with the aid of automatic computers. 
Indeed, the majority of such problems can be done on a desk calculator 
without too much difficulty. As discussed in the introduction to this 
chapter, results which are determined indirectly from experimental 
data, and which are to be made available for use by others, should always 
be computed by precisely describable methods. Nevertheless, many of 
the graphical methods that were used before the advent of the computers 
or before the increasing availability of desk calculators remain useful. 
They are not only quick ways to get sufficiently good results for inter- 
mediate use but, because of their basic nature, they give a better under- 
standing of the principles that underlie the computations. Finally, and 
particularly in conjunction with the use of automatic computers, they 
provide a good way of checking computed results. Hence we shall describe 
some of them. 

If a set of experimental measurements, such as those of temperature and 
length of a metal bar, are found to correspond approximately to a straight- 
line law, they may be plotted as the x’s and y’s of a graphic diagram, and 
their irregularities may be eliminated or “smoothed,” by drawing the 
straight line that appears to come closest to all of the points. 

It is desirable to use the “black-thread method” to locate the best 
position of the line in preference to a ruler or other ordinary straight 
edge since the thread and the points can all be seen at the same time; a 
ruler would hide half of the points if properly placed. More convenient 
than the black thread is a strip of transparent celluloid with a fine straight 
line scratched down the middle of one face. 
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As an example, consider the data in 
Table 4-1. It is suggested that the reader 
plot the values given in the table as ac- 
curately as possible, making use of the 
rules given in Section 4-1, and marking 
each point by a minute dot surrounded 
by a small circle. 

Be sure that the coordinate paper rests 
in a perfectly flat position and stretch a 
fine black silk thread on it in such a posi- 
tion that it follows the general direction 
of the points. Move it a trifle toward the 
top or bottom of the page, also rotate it 
slightly, both clockwise and counter- 
clockwise. Attempt to get it into such a 
position that it lies among the points, 
following their general trend but not necessarily passing exactly through 
any one of them. See that there are about as many points above the line 
as below it; if the high points are more numerous toward one end of the 
thread and the low ones toward the other end, rotate the thread enough to 
remedy the condition. When the thread is finally arranged in the most 
satisfactory position, do not attempt to draw the line at once but notice 
where the thread cuts the z-axis and where it cuts the y-axis, or where it 
passes through some easily remembered intersections on the paper. From 
these numbers calculate the slope of the thread, noting whether its value is 
positive or negative. Write the equation of the line indicated by the thread. 

The equation x/m -\- y/n = 1 is another form of the straight-line 
equation, and is easily reducible to the form y — a -+- bx. Since the graph 
of x/m + y/n = 1 passes through the points (0, n) and (m, 0), m and 
n are respectively the x- and y-intercepts of the line. The equation of a 
line that cuts both axes can be written immediately without any calcula- 
tion. The constants a and b in the straight-line equation can be obtained 
from the intercepts m and n on the axes or from any other two points far 
apart on the line. 

It should be noted that since coordinates are physical quantities and 
have units, the constants m, n, or a, b also have units. To make the first 
equation consistent, m and n must have the same units as x and y re- 
spectively ; and to make the second equation consistent, a must have the 
units of y , and b must have the units of y/x. For example, if these con- 
stants are determined from a plot of the data in Table 4-1, the units are 
degrees C for m, mm for n, mm for a, and mm/°C for b. 

Sometimes data will be so good that when they are plotted on graph 
paper no larger than 8.5 by 11 in., most of the available accuracy is lost; 


o 

o 

y, mm 

100 

909.8 

90 

908.5 

80 

908.0 

70 

907.2 

60 

906.7 

50 

906.5 

40 

906.2 

30 

905.5 

20 

905.0 

10 

904.1 

0 

903.9 

—10 

903.2 

-20 

902.3 
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0.07 

0.06 

0.05 

0.04 

0.03 

0.02 Ay 
0.01 
0 

- 0.01 

- 0.02 

FIGURE 4-1 


i.e., the plotted points fall more accurately on the straight line than can 
be shown on the plot. Then larger sheets of graph paper are useful. How- 
ever, large sheets are inconvenient and it is difficult to make accurate 
plots on them. For these situations a device called the method of “residual 
plot” will be found useful. 

The method of residual plot is as follows: First, determine approximate 
values for the constants a and 6 of the straight-line equation; call these a' 
and b'. Second, calculate values of y' for each of the x’s where y' = a' + 
b'x. Third, calculate values of Ay = y — y'. Fourth, plot values of Ay 
against a: to a larger scale than y was originally plotted. Fifth, determine 
the slope A b and the intercept Aa of this residual plot. Sixth, calculate 
accurate values for the constants from a = a' + Aa and 6 = 6' + A6. 
The effectiveness of this method arises from the fact that the Ay’s are so 
small that they can be plotted on a small sheet of graph paper to a much 
larger scale than the y’s can be plotted. 

As an example of the method of residual plot, assume that the plotted 
points are (1, 5.002), (2, 7.034), (3, 9.059), and (4, 11.068). Determine 
approximate values of a and 6 from the trial straight line of Fig. 4-1. 
These are 

a' = 3.0 and 6' = 2.000. 


Determine y’ from y' — a' + b'x and Ay from Ay = y — y', as shown in 
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Table 4-2 


X 

y 

y’ 

Ay 

1 

5.002 

5.0 

0.002 

2 

7.034 

7.0 

0.034 

3 

9.059 

9.0 

0.059 

4 

11.068 

11.0 

0.068 


Table 4-2. From the plot of Ay vs. x shown in Fig. 4-1, we find that A a = 
—0.015 and A b = 0.023. Since a = a' + Aa and 6 = 6' + A b, 

a = 2.985 and b = 2.023. 

4-4 NONLINEAR LAWS 

Whenever the experimental points lie on a straight line, the law of variation 
is easily deduced. When the experimental points do not lie on a straight 
line, as frequently happens, the problem is more difficult. However, if 
one can guess the proper mathematical equation, it is usually a simple 
matter to fit this equation to the points. Frequently, the law of variation 
will be known from theoretical considerations and the problem is then 
one of determining the constants of the equation by fitting the known 
mathematical equation to the experimental data. If there is no way of 
knowing the law except from the shape of the smoothed curve, the problem 
is one of trial and error. 

The simplest method of fitting nonlinear equations to experimental 
data can best be explained by means of an example. Suppose that the 
equation y 2 = a + bx 3 is to be fitted to the data of Table 4-3, which is 
suspected of following that law. We note that if we set y 2 equal to a new 
variable v and x 3 equal to u, the equation will become v = a + bu which 
is linear in v and u. Therefore, a plot of v against u will be linear and can 
be treated exactly as in Section 4-3. Thus, although a plot ( y vs. x) of the 
figures in the first two columns will not give a straight line, the figures in 
the last two columns of Table 4-3 will. The value of b is given by the 
slope and the value of a by the y-intercept of the straight line drawn. 


Table 4-3 


X 

y 

e 

ll 

CO 

II 

1 

2.2 

l 

4.8 

2 

4.4 

8 

19.4 

3 

7.6 

27 

57.8 

4 

11.4 

64 

130.0 

5 

15.9 

125 

252.8 
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If we had picked the wrong equation in the first place, the new variables 
plotted would not lie on a straight line and another equation would have 
to be tried. 

The mathematical functions which most usually appear as relations 
between physical variables are the trignometric functions, the exponential 
and logarithmic functions, and powers. It is advisable to become very 
familiar with the shapes of these functions, alone and in combination, by 
making graphical plots of equations like the following, with constants a, b, 
c of various sizes and algebraic signs: 


and finally,* 


y 

y 

y 

y 

y 


bx / -g bx\ 

ae cos cx , y = a{ 1 — e ) 

a sin bx, y = a cos bx, 

a tan bx, 


ae 


b/x 


ae 


—bx 


^or In y = In a + ^ 
(with b positive), 


y = X <*»*" • 

71 = 0 


Constants for this last equation could be obtained by using as many 
points from the smoothed, curve as there are constants in the equation 
selected. The values of x and y for the individual points can then be 
substituted into the equation. This will give as many equations as there 
are unknowns so that the values of the unknowns can be obtained by 
solving the equations simultaneously. Obviously it is best, if the shape of 
the curve permits, to use as few terms in the equation as possible so as to 
reduce the labor required in determining the constants. 

More importantly, the inclusion of too many terms tends to cause the 
recognition of imprecision as precision. The inclusion of seven terms in a 


* ]T is the standard symbol for “sum of.” The appended equation, then, means 
a 0 + aix + a2X 2 + 03a: 3 + • • • If an upper limit to n is specified, as 

1 

e - 

n=0 

the result will be a polynomial. To specify an infinite series, the sign <x> is given 
as the upper limit. The lower limit need not be zero, of course, but note that 

2 

Z n 

a n x , 

n= 2 


say, is infinite at x equal zero. 
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sum of powers to fit the data of Fig. 4-2 could produce the solid curve 
when, in fact, the dashed line might represent the data better. In other 
words, what is meant by, and perhaps what is lost by, “smoothing” must 
be carefully considered. 


fig. 4-2. Data scattered about a 
straight line (dashed) could suggest 
remarkably complex physical laws 
(solid line) if care is not taken. Con- 
versely, one must guard against mis- 
taking complexity for scatter. 

PROBLEMS 



1. In an experiment designed to determine the relation between two variables 
the following results were obtained. 


X, lb. 

V, in. 

10 

— 1.30 

20 

-0.91 

30 

-0.49 

40 

-0.03 

50 

0.34 

60 

0.81 

70 

1.16 

80 

1.65 


Plot a graph showing the relation between x and y; use a black thread and 
express the relation in the form y = mx -f- C. 

Answer: y = 0.0417a: — 1.731 

2. By using the method of the residual plot, find the resistance Rq at 0°C and 
the temperature coefficient a, where the resistance is R = Ro(l + at), for 
a sample of wire giving the following data: 


R (ohm) 

2.0700 

2.1465 

2.2225 

2.2990 

2.3750 

2.4515 

<(°C) 

10 

20 

30 

40 

50 

60 


Answer: R 0 = 1.9939 a = 3.824 X lO" 3 ^- 1 
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3. Observations are made of the excess temperature 6 of a cooling body as a 
function of the time t. Determine whether the data follows Newton’s law 
of cooling, 0 = do e~ bt , and if so, determine the constants do and b by 
graphical methods. 


t (min) 

0 

4 

8 

12 

16 

20 

0(°C) 

40.5 

27.0 

18.0 

12.0 

8.0 

5.3 


Answer: do = 40.5°C b = 0.1015 min 1 

4. Assume that the following data has been obtained from a laboratory experi- 
ment in which the time intervals were known to a high degree of precision. 
There is reason to believe that the data follow one of the two theoretical 
equations: 

a -j- bx o, d - bx 


The data are: 


x (sec) 

1 

2 

3 

4 

y (cm) 

10.025 

2.006 

-2.006 

-5.012 


Determine by plotting which equation fits the data better and determine 
the values and the units of the constants a and b in the better equation by 
using the method of the residual plot. 

Answer: Prefer the second form; a = 12.0311 cm sec, b = 2.00514 cm/sec. 

5. Find the constants A and B for Cauchy’s equation (/z = A + B/\ 2 ) for the 
index of refraction of light through a refracting medium. The data given is 
for sodium light refracted through water. Does the equation appear to be 
inaccurate in any region? If so, state which region. 


M 

1.4040 

1.3435 

1.3371 

1.3330 

1.3307 

1.3289 

A (A) 

2144 

3968 

4861 

5893 

6708 

7685 


Answer: The equation is very poor at the shortest wavelength. An expanded 
plot of the rest of the data shows that the equation is not very good any- 
where. Nevertheless, in the region of the longer wavelengths, A = 1.3238, 
B = 3.126 X 10 -11 cm 2 . Note that the problem implies that one is to plot 
H vs. (1/A 2 ). An alternative method, /xA 2 vs. A 2 does not show up the inad- 
equacies of the equation. 
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Numerical quantities which are found in handbooks or quantities which 
are used in industrial and experimental work are, in the last analysis, 
determined by some measurement which is either direct or indirect. 
Whenever measurements are made, there are errors present. Increasing 
the accuracy of a measurement is accomplished by reducing the magnitude 
of the errors but never by eliminating all of them. If, for example, all 
systematic errors have been eliminated in a particular measurement, 
certain accidental errors always remain. Errors then are divided into two 
general classes: systematic errors and accidental or chance errors. Generally, 
we associate the first type with “accuracy” and the second type with 
“precision. ” 

5-1 SYSTEMATIC ERRORS 

Systematic errors may be divided into several classes: theoretical, instru- 
mental, and personal errors. 

Theoretical errors. Errors due to the expansion of the measuring tape 
with temperature, the buoyant effect of air on weights in the chemical 
balance, the refraction of light in surveying, the air friction in gravitational- 
acceleration measurements, or the variation of the voltage being measured 
caused by the current drawn by the voltmeter, etc. are all errors of this class. 
They are reproducible effects and are often of a greater magnitude than the 
chance errors that would be found in such measurements; they constitute 
errors when unrecognized. Once recognized, they can be measured if 
necessary or, more generally, eliminated by calculating the magnitude of 
their effect and making the correction. For example, if an important dis- 
tance in an apparatus is defined by the length of a steel bar of known co- 
efficient of thermal expansion, whose temperature changes during the course 
of the experiment, then any measurement with this apparatus contains a 
systematic error if that temperature change is not taken into consideration. 
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The error is eliminated by measuring the length at one temperature and 
calculating it for other temperatures. 

Instrumental errors. Errors in the division of graduated scales on 
instruments, the eccentricity of graduated circles, which should be con- 
centric, the inequality of balance arms, the inaccuracy of the calibration 
of electrical indicating instruments or thermometers are errors of this 
class. These errors can be eliminated or reduced by applying experi- 
mentally determined corrections or by performing the experiment in such 
a manner as to eliminate their effects. 

Personal errors. These errors are due to personal peculiarities of the 
observer, who may always answer a signal too soon or too late, or who 
may always estimate a quantity to be smaller than it is, etc. The character 
and magnitude of these errors may be determined by a study of the 
observer. His “personal equation” may be obtained and his observations 
corrected for these sources of error. 

Mistakes. Strictly speaking, mistakes are not errors, for mistakes will 
not occur when the observer takes sufficient care, whereas errors cannot 
be eliminated by care alone. However, the best observers will occasionally 
relax their vigilance and make mistakes. Usually these mistakes can be 
detected if every observation is recorded directly on a data sheet or in a 
data notebook and not altered. The figure 8 is frequently mistaken for 
the figure 3 and vice versa. An angle of 52° may be read 48° when the 
observer notes that it is just two degrees away from 50°. Such mistakes 
are usually obvious in the original data but may be covered up if the 
results of calculations are recorded instead of the original data. For this 
reason it is imperative to record the actual readings, not just the dif- 
ference between two readings. It is also important to avoid erasing any 
entry which may be thought to be wrong, since it may later turn out to 
have been correct. When an observation is thought to be mistaken, it is 
best to draw a neat line through it to indicate the desire to reject it. 

5-2 CHANCE ERRORS 

Chance errors are due to irregular causes. These errors are most likely to 
appear when accurate measurements are being made, for when instru- 
ments are built or adjusted to measure small quantities, the fluctuations 
in the observations become more noticeable. For example, if one wishes to 
measure the length of a laboratory table to the nearest foot, there would be 
no excuse for any observation differing from any other observation no 
matter how many times the measurement is made. However, if one 
attempts to measure the length of the same table to the nearest r^oo i n -> 
the individual observations will be sure to differ greatly among themselves 
provided that these observations are performed with care and without 
prejudice. 
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When all of the systematic errors have been eliminated or reduced to a 
minimum, the remaining chance errors will be the chief source of impre- 
cision. Fortunately, most chance errors have been found to follow a 
definite mathematical law, called the normal or Gaussian distribution of 
errors. When the errors are distributed in such a predictable way, they 
can be examined in such a manner as to increase the reliability of the 
results. By studying the distribution of errors and the laws governing the 
probability of the occurrence of errors, one may greatly increase the 
precision of his experimental results. The treatment of chance errors forms 
a large part of the subject of precision of measurements. The laws of 
probability will be treated in the next chapter as a prelude to the treatment 
of chance errors. 
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PROBABILITY 


In order to understand the treatment of chance errors it is not necessary 
to have a highly sophisticated knowledge of the laws of probability, but a 
basic knowledge is necessary. The following brief treatment will suffice 
for the purpose. If a more advanced understanding of the subject is 
desired, one should consult standard texts [1, 2]* on the subject. 

There are two general methods for determining the probability that an 
event will occur. The analytical method is appropriate when the event 
can be broken down into more basic events, whose probabilities of occur- 
rence are known on some a priori basis. When this cannot be done, it is 
necessary to resort to the experimental method. These two methods will 
be introduced in the order in which they are mentioned here, but in most 
of the subsequent illustrative discussion we shall return to the analytical 
method. 

6-1 ELEMENTARY LAWS OF CHANCE: THE 
ANALYTICAL METHOD 

The laws of chance, which apply to many situations, including chance 
errors in measurement, are most easily understood by considering the 
tossing of perfect coins and the throwing of perfect dice. Here the basic, 
elementary event is the appearance of a particular face of a single coin 
or a single die. In each case there exists an a priori basis by which the 
probability of the occurrence of such an event can be determined. If a 
perfect coin of negligible thickness or with a rounded edge is tossed once, 
only one of two possible events can occur. Hence the probability of the 
occurrence of either is just Similarly, the probability of the appearance 
of any one face of a perfectly balanced die tossed once is just 


* Numbers in brackets are keyed to the References at the end of the book. 
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When a complex event, such as the appearance of five heads when 
seven coins are tossed, is broken down and discussed on the basis of the a 
'priori probability of the occurrence of a head when a single coin is tossed 
once, it is the analytical method that is being applied. 


6-2 THE EXPERIMENTAL METHOD 

This method can be introduced with an example in which it is not actually 
needed. The example is provided by the data of Table 6-1, which shows 
the results of successively greater numbers of tosses of a coin. The basis 
of the experimental method lies in the statement that in general the 
probability of an event occurring in one trial is defined as the limit of the 
ratio of the number of occurrences to the number of trials when the latter 
increases indefinitely. Since, in fact, no coin for instance can be known 
absolutely to be perfect, the choice between the use of the analytical or the 
experimental methods must depend on what is at stake. The application 
by gamblers of the experimental method to a roulette wheel which paid 
on the basis of the analytical method has been known to force the sub- 
stitution of a new wheel. 

% 

Table 6-1 


Number of 
throws 

Heads 

Tails 

Difference 

Ratio 

10 

7 

3 

4 

0.700 

20 

12 

8 

4 

0.600 

40 

23 

17 

6 

0.575 

100 

55 

45 

10 

0.550 

200 

108 

92 

16 

0.540 

500 

257 

243 

14 

0.514 

1000 

513 

487 

28 

0.513 

2000 

1016 

984 

32 j 

0.508 


There is a further interesting observation to be made from the data of 
Table 6-1. As the number of throws increased, the ratio did approach 0.5, 
but the difference between the number of heads and the number of tails 
thrown tended to increase, in this trial. 

This possible behavior of the difference even with a perfect coin is some- 
thing that the layman seldom takes into account. When an individual 
has lost money on a game of chance in which his probability of winning 
is 0.5 he frequently continues to play on the assumption that the laws of 
probability guarantee that he will come out even in the long run. This 
assumption is fallacious for it should be obvious that a coin has no 
“memory” which can cause its future action to be governed by its past. 



36 


PROBABILITY 


[6-3 


The laws actually predict that the individual is quite likely to lose more 
and more. 

The experimental method is, of course, very widely used in situations 
where there can be no basis for even attempting to apply an analytical 
method. The experimental method is used in making mortality tables 
for the guidance of life insurance companies. From the number of deaths 
occurring in each age group in the past these companies can predict with 
reasonable accuracy their risk in insuring the lives of individuals. In 
this case, fortunately for the insurance companies, and presumably because 
of continued progress in medicine, the method has generally predicted 
higher death rates than have been experienced. The analytical method 
could not be used in this case. 

Companies that manufacture large quantities of their products frequently 
inspect only a small percentage of them. For the inspection of small quan- 
tities they can afford to use more careful methods than they could for the 
inspection of the entire output. Tests have shown that the inspection of 
properly chosen samples gives a more reliable determination of the uni- 
formity of the product than has been obtained by inspecting the entire 
output. 

6-3 COMPOUND PROBABILITIES: THE ANALYTICAL 
METHOD 

The analytical method of determining the probability of a given event is 
applicable when that event can be broken down into more basic independent 
events which have known probabilities. For example, when any perfect 
coin is shaken up and tossed on a carpet, the probability of it landing tail 
up is the same as for any other perfect coin and, in particular, has the 
value 0.5. If two perfect coins, e.g. a nickel and a penny, are tossed, the 
following four combinations are equally probable: head-head, head-tail, 
tail-head, and tail- tail. The probability of any one of these combinations 
occurring is then £ or 0.25. Since two of these four combinations contain 
a head and a tail, the probability for such a combination is f . Likewise 
the probability of not getting two heads is £ since three of the four com- 
binations do not contain two heads. 

As another example, consider the probability that two fours will come 
up when two dice are thrown. The probability of getting four on any single 
die is £ so the probability of getting two fours with two dice is £ X £ = 

This can be seen from the fact that there are 36 different ways the two dice 
can fall and only one of these ways produces two fours. 

Events of the sort in the examples above shall be called compound 
events. The probability of a compound event happening is the product of the 
probabilities of the independent events that make up the compound event. 
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The rule for compound probabilities can be shown more generally as 
follows. Suppose that the first event can happen in A ways and fail in 
B ways, all the ways being equally probable, while the second event can 
happen in A' ways and fail in B' ways, all of which are equally probable. 
The probability of the first event happening is 

A 

A+ B 

and that of the second event happening is 

A' 

A' + B' 

For each of the A ways the first can happen, the second can happen in 
A' ways. Therefore the number of ways both can happen is A A'. Like- 
wise, for each of the ( A + B) ways the first can either happen or fail, 
the second can either happen or fail in (A' + B') ways and, therefore, the 
two can either happen or fail in (A + B)(A' + B') ways. Thus the 
probability that both will happen is 

AA' 

(A + B)(A' + B') * 

This establishes the rule for the probability of a compound event made up 
of two independent events. 

Before considering events compounded from more than two simple 
events it is helpful to relate this discussion to the four different compound 
events that are possible with the two coin example. The number of ways 
each of four events can happen is shown in Table 6-2 together with the 

Table 6-2 


Compound event 

Number of ways 

Probability 

(1) Both happen 

AA' 

A A' 

(A + B)(A' + B') 

(2) Both fail 

BB' 

BB' 

(A + B)(A , + B') 

(3) First happens, 
second fails 

AB' 

AB' 

(A + B)(A'+B') 

(4) First fails, 

BA' 

BA' 

second happens 

(A + B){A' + B') 

Total (either happen 
or fail) 

A A' + BB' + AB' + BA' 

= (A + B){A' + B') 

1 
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probabilities of their happening. Here, the prime is assigned to one coin, 
no prime to the other. Each letter will have the value unity. “Happen” 
could mean the appearance of a head and “fail” the appearance of a 
tail, or vice versa, so long as consistency is maintained. As should be ex- 
pected, since one of these four must happen, the sum of their probabilities 
is unity, which corresponds to certainty. It is seen that if one only specifies 
the event “one happens and one fails” rather than which is to happen and 
which is to fail, the latter being items 3 and 4 of Table 6-2, then this event 
can happen in two ways. 

The proof can easily be extended to include compound events requiring 
the occurrence of any number of simple events, because this proof is valid 
when either of the two events is itself compound. For three events having 
probabilities of occurrence pi, P 2 , and p% the probability that the first 
two will occur is P\P 2 - Next consider occurrence of the first two events as 
a single event that may be combined with the third. Thus the probability 
that all three occur is (p\P 2 )ps = P\P 2 Ps- This process can be continued 
to include any number of events. 

6-4 PROBLEM OF n DICE 

Special consideration should be given to a particular type of compound 
event, those involving a large number of simple events each of which has 
the same probability of occurrence. Let us call this problem the “Problem 
of n dice, ” because it is so easily illustrated and treated by considering n 
identical dice with identical markings. One should keep in mind that the 
solutions to this problem are applicable to a great many situations not 
involving dice. In particular, we will find it to be directly applicable to 
certain problems involving chance errors. 

Let us first work out a problem in which five identical dice are thrown. 
Assume that each of these dice has two of its six faces painted red while 
the rest are white, and that none of the faces have dots or other markings. 
If these five dice are shaken and thrown, it is obvious that the probability 
of all five coming up with red faces isfxfxfxfxf = Gj) 5 and the 
probability of them all coming up with white faces is (§) 5 . If one asks, 
“what is the probability of their coming up with two red faces and three 
white?”, the problem becomes more complicated. Although the com- 
bination containing either all red faces or all white faces can be achieved 
in only one way, the combination containing two red and three white 
faces can be achieved in ten different ways as shown in Table 6-3. The 
letter R in row 2 and column 7 indicates that the No. 2 die comes up with 
a red face in the 7th combination. No. 3 die is white in this combination. 
The probability of the first combination is ^ X^X§XfXf = (t?) 2 (f ) 3 
and the probability of the second combination is^XfX^XfX§ = 
(i) 2 (§) 3 - Each of the ten combinations has the same probability of occur- 
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Table 6-3 


Combination 

Die 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

R 

R 

R 

R 

W 

w 

W 

W 

W 

w 

2 

R 

W 

W 

W 

w 

w 

R 

W 

R 

R 

3 

W 

R 

W 

W 

w 

R 

W 

R 

R 

ir 

4 

W 

W 

R 

w 

R 

W 

W 

R 

W 

R 

5 

w 

W 

W 

R 

R 

R 

R 

W 

W 

ir 


rence. Since there will be two red and three white faces if any one of the 
ten combinations occur, the probability that just two red and three white 
faces come up is ten times the probability of occurrence of any one com- 
bination, or 10(^) 2 (§) 3 . 

The application of this treatment to events not involving dice is rather 
obvious. If five different events each have probabilities of occurrence of + 
their probabilities of failing to occur will each be § . The probability that 
two of these events will occur and the other three fail to occur is 10(ij) 2 (§) 3 . 

The solution of the general problem follows the same procedure. Assume 
that there are n elements, and trial with one has a probability of success p 
and a probability of failure q. Thenp + <7=1. The probability of success 
with each element in a single, simultaneous trial of all n of them is p n and 
the probability that all fail is q n . Again the problem will be more complex 
if one asks the probability of just r of the n events occurring and the 
other (n — r) events failing because, as with the examples, such a result 
can be achieved by various combinations. As before, each of these com- 
binations has the same probability of occurrence, namely, p r q n ~ r . In 
order to find the answer to the problem in question it will be necessary to 
find the number of combinations that will produce the result desired, 
that is, the number which corresponds to our number 10 above. In the 
problem of the dice above, 10 represents the total number of combinations 
possible when two things are selected from 5; there are 10 different ways 
in which only two of the five painted dice can come up red. In the general 
case of n possible events one must find the number of different ways in 
which the particular r events he is investigating can occur. This number 
is usually referred to in the abbreviated form, “the combination of n things 
taken r at a time,” and is designated by the symbol C(n, r). The prob- 
ability then that just r of the n equally probable events will occur and 
the other ( n — r) events fail is 

G>(n, r) = C(n, r)p r q n ~ T . (6-1) 

The subject of combinations is well treated in most algebra texts. The 
principles involved also underly the problem of arrangement, and are 
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essential for much of the work in probability. Since probability in turn 
lies at the root of the study of chance errors in measurement, we shall 
treat all these related topics in the next paragraphs. 

6-5 PERMUTATIONS AND COMBINATIONS 

Suppose we have 5 toy building blocks lettered A, B, C, D, and E and a 
box divided into 3 compartments each of which will hold just one block. 
The number of different ways in which we can arrange the 5 blocks in the 
3 compartments is the “number of permutations of 5 things taken 3 at a 
time”; it is designated P( 5, 3). On the other hand, if we have a bag that 
will hold just 3 of these blocks, the number of different bagfuls that can 
be selected from these five blocks will be less than the number of ways in 
which three blocks can be arranged in the box, since it is impossible to 
distinguish any ordering in the bag. This number of different bagfuls is 
the “number of combinations of 5 things taken 3 at a time,” and is 
designated C( 5, 3). 

It is easier to calculate the equation for C(n, r) from the equation for 
P(n, r) than to calculate it directly. We shall, therefore, treat P(n, r) 
first. Consider the problem of calculating P( 5, 3) by referring to Table 6-4. 
Any permutation must have one of the five blocks in the first compartment 
or space so that there are 5 choices for the first space. When this space is 
filled there are only 4 choices for the next space, but there are 4 second- 
space choices for each of the 5 first-space choices. Thus there are 5 X 4 = 20 
choices for the first two spaces. When the first two spaces are filled there 
are only three blocks to choose from for the third space, but there are 3 
third-space choices for each of the 20 choices for the first-two spaces. 
Thus there are 20 X3, or 5X4X3, choices for the three spaces. That 
is, P(5, 3) = 5 X 4 X 3 = 60. These 60 choices are listed in Table 6-4. 

It is a simple matter to pass from the special case to the general, where 
there are r spaces to fill with n different things. In this case there are n 
choices for the first space and for each of these n choices there are ( n — 1) 
choices for the second space. This gives n(n — 1) choices for the first two 
spaces. For the third space there are (n — 2) choices for each of the 
n(n — 1) choices for the first two spaces. At the rth space one finds that 
there are (n — r + 1) choices for each choice for the previous spaces. 
Thus 

P(n, r) = n(n — l)(n — 2) • • • (n — r + 1). (6-2) 

This is usually the simplest form for calculations,* but the equation can 

* This statement will be true when n and r are of the magnitudes implied by the 
examples being discussed. When n and r are very large, approximation methods, 
to be referred to later, become almost imperative. 
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Table 6-4 


ABC 

BAC 

CAB 

DAB 

E A B 

A B D 

BAD 

CAD 

D A C 

E AC 

ABE 

B A E 

CAE 

DAE 

E AD 

AC B 

B C A 

CBA 

DBA 

E BA 

AC D 

BCD 

C B D 

D B C 

E B C 

ACE 

BCE 

C BE 

D B E 

E B D 

A D B 

B DA 

C DA 

D C A 

EGA 

ADC 

B DC 

C D B 

D C B 

E C B 

A D E 

B D E 

C D E 

D C E 

E C D 

A E B 

B E A 

C E A 

D E A 

EDA 

A E C 

B E C 

CEB 

DEB 

E D B 

A E D 

BED 

C ED 

DEC 

E DC 


be made more compact by multiplying the right-hand side by 

factorial (n — r) _ (n — r)\ 
factorial (n — r ) in — r ) ! 


which is unity. Then Eq. (6-2) becomes 
Pin, r) = n(n — l)(n — 2) • • • (n — r + 1) 


n\ 


(n_—r)! _ 

(n — r)\ (n — r)\ 


In the above problem it was seen that any given combination of three 
letters such as A, B, and C can appear in several different permutations, 
C AB, BAC, CBA for example. There are always more permutations than 
combinations; in fact, the total number of permutations can be obtained 
by simply multiplying the number of combinations of n things taken r 
at a time by the number of ways those r things in each combination can be 
permuted, that is, by the number of permutations of r things taken r at 
a time. FromEq. (6-2) we see that P{r, r ) = r\. Thus P(n, r ) = r\C(n, r ) 

° r Pin, r) _ 


C{n, r) = 


ni 


r! 


(n — r) \ r\ 


(6-3) 


In connection with the problem solved in Section 6-4, as well as for use 
in the next section, we note that the binomial expansion of (q + p) n can 
be written as 

(9 + P)“ = £” C(n, r)q n -y. (6-4) 

r= 0 

Each term of this summation represents the probability of a particular 
compound event happening. The rth term is identical to Eq. (6-1) and 


* For these equations to be valid for the special case n = r, one must take 
0! = 1. The proof of the validity of this latter equation is outside the subject 
matter of this book. 
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therefore gives the probability of just r of n events happening given that 
the probability of each happening is p and the probability of each failing 
is q. The probability of all events failing (r = 0) is C(n, 0 )q n p° = q n , 
the probability of just one event occurring, for which r = 1 , is 

C(n, 1 )q n ~ l p = nq n ~ l , 

and so on. In the summation there is one term for every possible number 
of the n events that can occur. Therefore, the sum of all these terms 
should be one, or certainty. Since q + p = 1 , (q + p) n = 1 , so the sum- 
mation does equal one. 


6-6 THE BINOMIAL DISTRIBUTION 

Equation (6-4) represents what is known as the binomial distribution. 
We use the term “distribution” because through the variation in the 
values of r, the equation represents the distribution of probability among 
these events. We discussed the meaning of this term in the last section. 
We shall give two detailed examples here. 

Consider the tossing of 10 good coins and suppose that we want to 
know the number of heads which appear. We refer to each of the 10 coins 
as an element ; the probability of success in a single trial of a single element 
is p = The solution is attained by describing the probabilities of seeing 
no heads, 1 head, 2 heads, etc., in a single trial of the 10 elements. These 
probabilities are given by the successive terms of Eq. (6-4). For this 
example q is also \ so that the products q n ~ r p T are always 01) 10 = Tth*- 
To be more specific, the term for r = 0 is (7(10, 0)(^) 10 (^)°, that for 
r = 1 is (7(10, l)(i) 9 (£)\ etc.; the value of (7(10, r) is given by Eq. (6-3). 
Thus we can determine the binomial distribution (P(10, r ) by evaluating 
all such terms for r = 0 through 10. (See Fig. 6-1.) 

The distribution shown in Fig. 6-1 is symmetric, but this is not a general 
property of the binomial distribution; it is true only for this special case 
in which p = q = For example, consider the rolling of 10 dice, with 
attention directed to the number of aces which appear. Here the probability 
of success in a single trial of a single element is p = £ and, of course, 
q = f . If exactly the same procedure is carried out for this problem as for 
the previous one, we will obtain Fig. 6-2. Evidently the distribution would 
be exactly the same if we were interested in any other single face of a die. 

We can now ask, “What is the average number of occurrences of a single 
event, or what is the ‘expectation value’ for this event, in many trials 
with the n elements?” It is clear from Fig. 6-1 that the most probable 
number of heads per trial is 5, and from Fig. 6-2 that the most probable 
number of aces in a single roll of 10 dice is 1 . However, the average numbers 
in each case are not necessarily the same as these most probable numbers. 
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fig. 6-1. The bar at r = 6, for instance, has a height proportional to the 
probability of finding 6 heads when 10 good coins are tossed. Since the width 
of each bar is unity, the areas of the bars are also proportional to the probabilities. 

The term “average” implies that a large number N of trials must be 
made, each involving the n elements. The number of occurrences of the 
event in question must be added, and this sum must be divided by N. 

It is at this point that one of the basic problems in the treatment of 
chance errors can be introduced. It is evident that in any experiment, 
the number N must be finite. It is obviously foolish to consider carrying 
on the same experiment indefinitely ; fortunately it is also not necessary to 
contemplate such a hopeless idea. One might almost say that the object 
of a book such as the present one is to demonstrate this latter fact. 

It is certainly true that if one were to carry out an averaging process 
with a finite number of trials, and then to repeat it, he could not expect 
to obtain exactly identical results. That is, a difference is always to be 
expected between an experimental average and the theoretical expectation 
value of an analytically determinable distribution of probability. The 
problem at hand, however, is to determine the expectation value for the 
binomial distribution; we shall come to the difference between it and an 
experimentally determined average later. 
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By its very definition, we can determine the expectation value by 
adding the products of each possible number of successes and the analy- 
tically defined probability that that number of successes will occur. Since 
the sum of the probabilities is one, no subsequent division by a number 
of trials is necessary. 

If the expectation value is called p, then by the above definition, 

m = L >•[<?(«, r )«" _ vj- 

r=0 

The factor in brackets is not zero for any value of r. But the first term of 
the sum is zero since r itself is zero in this term. Thus 




/ v 

= E 

r = 1 


n! r 

r\(n — r)! ^ 


~ r p r . 
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Next we note that r/r\ = l/(r — 1)!. It is convenient then to let y — 
r — 1 and note that when r = n, y = n — 1. We shall designate this 
latter value of y by l. Then we have 

„ = V 1 2® n‘- y v t+l 

Ziv'-U-v)' ’ 

where n! has been factored into n[(n — 1)!]. If the fixed numbers n and 
p, factors which are the same in all terms of the sum, are brought out, we 
obtain 

E Z! I — y y 

p ■ 

But by Eqs. (6-1), (6-3), and (6-4) we find that this is just 

p = np (6-5) 


since (q -f- p) 1 = 1 regardless of the value of l. 

In the symmetric distribution of Fig. 6-1, p — 5, which is also the 
most probable number of heads. In fact, the analytical average, i.e., the 
expectation value, is identical to the most probable event for any sym- 
metric, single-peaked distribution. In the case of the asymmetric distri- 
bution of Fig. 6-2, however, p = 1.667, which, being nonintegral, is not 
one of the possible events. 

The problem raised earlier concerning the differences between the 
expectation value and an experimental average must be discussed with 
reference to the width of the distribution. We shall defer this discussion 
to a later chapter. 


6-7 THE POISSON DISTRIBUTION 


The well-known and very important probability distribution function 
called the Poisson distribution can be easily derived from the binomial 
distribution. It is obtained when p, the probability of success in a single 
trial of a single element, is extremely small but the number of elements 
involved in a single experimental trial is so great that a measurable success 
is assured. Thus the Poisson distribution is deduced from the binomial 
distribution by assuming that n increases without bound and p decreases 
without bound, but in such a way that the product np , the expectation 
value, remains finite. 

The same script (P will be used to designate this new distribution, but 
it will be given in terms of p and r instead of n and r. That is, 


(P(/x, r) - lim 

71— >00 


n! 

(n — r)\n r 


r 




r! 



y 
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where we have used the fact that q = 1 — p. Now, 

n! n(n — l)(n — 2 ) • • • (to — y — j— 1) 

n r (n — r)! n r 

” 1 0 — -X 1 ~ ») ■ 

where the second form on the right-hand side is obtained by noting that 
there are just r factors in the numerator of the first form. As n gets larger 
without bound, this whole expression approaches one. 

The other factor in (P which contains n is 




(n — r)(n — r — 1) 


2! 


(;) ! 


(n — r)(n — r — l)(n — r — 2) 
_ 


© 


we obtain this by binomial expansion. Shifting the n’s in the powers of 
0 /n), we get 

0 - £)“ - 1 - (■ -5)- + <’ -5X 1 -^) fi 


Again, as n is increased without bound, this becomes 
(■ - ;)' 




Hence the Poisson distribution is 

r) = ^ 


(6-6) 


and it is to be noted again that, whereas the binomial distribution is given 
in terms of the number r of successes, the probability p of success in a 
single elementary trial, and the number n of elements per experimental 
trial, the Poisson distribution is given in terms of the number r of successes 
and the expectation value per experimental trial, p. 

Perhaps the most obvious application of the Poisson distribution is 
to the field of radioactive counting. Consider the case of a very long- 
lived alpha-emitting metal such as uranium. The probability p that a 
single atom of 92U 238 will “fire” in one second is only about 5 X 10 -18 . 
On the other hand, a half milligram of this metal will contain about 
1.3 X 10 18 atoms. Thus we have here the case envisaged during the 
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Number of counts, r FIGURE 6-3 

deduction of the Poisson distribution from the binomial distribution; n 
is very large and p is very small, but they are so related that the product 
np is finite and has the value 6.5. Suppose that p is the elementary 
probability of success, success being the observation of a decay or count, 
and that the single experimental trial is the observation of the 0.5 mg of 
uranium for one sec. When we say that the expectation value is 6.5, we 
mean that if the numbers of decays observed to occur in many individual 
one-second intervals are added up and the sum divided by the number 
of such intervals, we will obtain a number close to 6.5 as the expected 
result. Again, as in the previous section, the discussion of how close to 6.5 
we may expect this result to be will be deferred to a later chapter. Here 
we shall discuss the analytically determinable values of the distribution. 

With p = 6.5, the probability that r counts will be observed in a one- 
second interval is given by 

<P(6.5,r) = ^fe- 65 . 

Values of this function are given in Table 6-5 and plotted in Fig. 6-3. 
We can make several observations about this figure: the distribution is not 
symmetric; and, as with the binomial distribution which described the 
rolling of 10 dice, the expectation value does not coincide with any of the 
actually possible observations, which can only be some integral number of 
counts per one-second interval. It is for this latter reason that, though the 
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Table 6-5 


r 

<P(6.5, r) 

r 

<P(6.5, r) 

0 

0.0015 

7 

0.1462 

1 

.0098 

8 

.1188 

2 

.0318 

9 

.0858 

3 

.0688 

10 

.0558 

4 

.1118 

11 

.0330 

5 

.1454 

12 

.0178 

6 

.1575 

13 

.0089 


points in Fig. 6-3 are connected for illustrative purposes, a smooth curve 
is not drawn between them. 

It is seen, however, that the maximum value does fall near the expec- 
tation value. Connecting the points of Fig. 6-3 serves to suggest another 
property of this distribution, that, as n increases, the fractional or relative 
difference between /x and the maximum decreases. Furthermore, if such a 
distribution is plotted for large values of /jl with more closely spaced 
abscissa units so that the curves always lie on coordinate axes of about 
the same absolute size, the curves will appear smoother than the one 
depicted in Fig. 6-3. 

Another interesting observation, a numerical description of which will 
have to accompany the later discussion of expected differences between 
l u and experimental averages, is that when n is 6.5 it is rather improbable 
that there would not be any count, or only one count, or 13 counts in a 
single one-second interval. One would have to make observations over a 
large number of one-second intervals to be reasonably sure of observing 
any intervals in which such large or such small counts occurred. Based 
on the values in Table 6-5, we could expect that in an observation of 10 4 
one-second intervals, only about 15 intervals would go by with no count, 
about 89 with 13 counts, but 1575 intervals would have six counts. 

If for these 10 4 trials, say, we had plotted in Fig. 6-3 the numbers 15, 
98, 318, etc., rather than the values of (P(6.5, r), we would have a frequency 
distribution. To think of the problem in this way reemphasizes the dif- 
ficulty just pointed out. While 10 4 one-second intervals constitute a 
sizable fraction of a working day, they are obviously not a sufficiently 
large number of intervals to provide us with exact experimental knowledge 
of the distribution; no matter how many observations are made, they are 
still not sufficient. The sum of the (P(6.5, r) in Table 6-5 is 0.9929. In 10 4 
one-second intervals then, 9929 of them might be expected to contain a 
number of counts somewhere between zero and 13. Even if this expec- 
tation actually were observed, it is clear that of the 71 intervals which 
contained counts in numbers of greater than 13 some one of them would 
contain the greatest number. If n is 1.3 X 10 18 , we cannot expect that 
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another 10 1 2 3 4 5 6 intervals would not include one which contained more than 
the previous maximum. One must always expect a difference between the 
experimentally observed distribution and the analytical distribution which 
is giving rise to the experimental distribution. 

Reference to the value of n also points out an obvious but in this case 
unimportant lack of rigor in the application of the Poisson distribution 
to an actual experiment in radioactive decay. If an observation of the 

0.5 mg of radioactive metal were carried out for the 10 4 one-second inter- 
vals and if the distribution of counts were found to be in accord with the 
probabilities listed in Table 6-5, then n would have been reduced by at 
least 63953, which is the sum of the products r(P(6.5, r) for r = 1 through 
13. Consequently, p. would not have been constant during the course of 
the experiment. Though the loss of 63,953 out of 1.3 X 10 18 would 
obviously be undetectable, we must remember that the above example 
was a special case. Most radioactive materials have a much higher value 
of p than that used here and the present-day scientists are able to manipu- 
late much smaller amounts of material than 1.3 X 10 18 atoms. 


PROBLEMS 

1. Given P(n, 3) = 10 P(n, 2), find n. 

Answer: 12 

2. Given P(n, r ) = 272, C(n, r) = 136, find n and r. 

Answer: n = 17, r = 2 

3. Five different positions are to be filled, and there are 20 different applicants 
each applying for any one of the positions. In how many ways can the 
positions be filled? 

Answer: 201/15! 

4. In a certain town, there are 4 aldermen to be elected, and there are 8 
candidates. How many different tickets can be made up? 

Answer: 70 

5. In how many ways can a pack of 52 playing cards be divided into 4 hands 
so that each way will produce a different situation in a bridge game? Note 
that if any two players exchange hands a different situation is produced. 
Answer: (4 !) (52 !) / ( 1 3 !) 4 

6. How many different baseball lineups of 9 men each can be chosen from 15 
players of whom 8 are qualified to play in the infield only, 5 in the outfield 
only, and 2 in any position (battery included in infield) ? 

Answer: 14 X (8!/2!) X (5!/2!) 
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7. From a bag containing 5 black balls and 4 white balls, three are drawn at 
random. What is the probability that two are black and one is white? 
Answer: 10/21 

8. The letters of “front” are shaken up in a bag and three are drawn, one at 
a time. 

(a) What is the probability that they will spell “ton” as drawn? 

(b) What is the probability that the three letters can be arranged to spell 
“ton”? 

Answer: (a) 1/60 (b) 1/10 

9. Find the probability of throwing one and only one ace in two trials with a 
single die. 

Answer: 10/36 

10. Find the probability of throwing: (a) exactly 3 aces in 5 trials with a single 
die; (b) at least 3 aces in the 5 trials. 

Answer: (a) 250/7776 (b) 276/7776 

11. Make a table similar to Table 6-5, but for the probability of getting various 
numbers of counts in 2-sec intervals from the f-mg sample of 92U 238 . 
[Hint: Note that, for the Poisson distribution, (P (p, r + 1) = [p/(r + 1 )] 
(P(m, r).] 

Typical answer: 0.0152 at r = 6, 0.1098 at r = 13 

12. Reproduce the result obtained in Problem 11 at r = 2 by using informa- 
tion from Table 6-5. 


13. 


Evaluate, for the Poisson distribution, log (P(100, r) for r every two units 
from 94 through 106, and plot the results against r. Considering the shape 
of the curve, plot the same results against (99.5 — r) 2 and evaluate a 
and b in 


(? = a exp 



(Use Item C of Appendix 3 as a source for factorials and their logarithms.) 
Answer: a = 0.040, 6 = 2p 

(Note: If the work was done carefully, it will show up the inherent lack of 
symmetry in the Poisson distribution.) 

14. Repeat Problem 13 for the binomial distribution (P(200, r) with p = q, 
except compare with (100 — r) 2 . Use log (200!) = 374.897. 

Answer: a = 0.0564, b — 2 npq 

15. Calculate the distribution to be expected with 6 tetrahedral “dice.” Cal- 
culate the expectation value for any particular face by averaging directly 
over the distribution and compare with Eq. (6-5). 



CHAPTER 

7 


DISTRIBUTION OF CHANCE 
ERRORS 


The concept of distribution of probability was presented in the previous 
chapter. Such a distribution is a body of information by which one can 
know the relative likelihood of occurrence of events which differ in detail 
but are of a similar nature. The event, the appearance of one head in a 
toss of 10 coins, is different in detail from the appearance of two heads, 
but both of these are of a similar nature. On the other hand, one would 
not ordinarily describe by the same distribution such diverse events as a 
power failure in New York City and the impact of a meteor on the moon, 
even though a connection is conceivable. 

The “body of information” can take many forms as illustrated by 
Eq. (6-6), Table 6-5, and Fig. 6-3. Of these three the first contains the 
most information, and in fact, a mathematical description of a distribution 
is generally most useful and most convenient. 

The occurrence of an error of a particular size in a particular type of 
measurement is an event of the sort in which we are interested here. 
Consider the experiment used in Section 6-7 to illustrate the Poisson 
distribution. The object of such an experiment is usually the determination 
of the probability per second that a single atom will decay. To carry out 
the experiment, we must first weigh the sample, and next from the weight 
and molecular weight of the sample and Avogadro’s number, find n. Then 
we must divide n into y to get the desired result. The estimation of y 
must be made by dividing the total number of counts by the number of 
one-second intervals in which that number of counts was observed. In 
the language to be used through most of the remaining chapters of this 
book, an event other than the one sought is what we mean by the term 
“error” and the size of the error is the difference between the observed 
event and the desired (i.e., the average or, perhaps, the most probable) 
event. This suggests that we can obtain an error distribution curve by a 
translation of the axis along which events were plotted in Fig. 6-3 to a 
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position where the origin coincides with the desired event. This new curve 
gives us the error distribution, a picture of the relative probabilities with 
which different errors can be expected to occur. Thus in the example of 
the last chapter we can think of the occurrence of 11 counts in a one- 
second interval as a positive error and the occurrence of 3 counts as a 
negative error. 

The distribution which is most widely useful for the treatment of errors 
in scientific measurements is neither of the two introduced so far. We 
mentioned in Section 5-7 that when systematic errors have been reduced 
to a minimum, there will be residual errors for which no explanation can 
be found. Fortunately, these errors follow the laws of chance so that 
although they prevent us from having absolutely certain knowledge, a 
proper consideration of them will lead us to reliable knowledge about the 
quantities being measured, as well as giving us an idea of the extent of this 
reliability. In most cases the errors will be just as likely to be positive 
as negative, and will usually follow the so-called “normal-distribution 
law,” to be introduced in mathematical form in the next chapter; its 
expected graphical appearance is illustrated in this chapter. However, 
there are types of measurements in which the chance errors are not nor- 
mally distributed; the distributions may be symmetrical or nonsym- 
metrical — that is, in the latter case the errors may be more likely positive 
than negative, or vice versa. Nonsymmetrical distributions are called 
skew. Skew distributions are difficult to treat, but are generally treated 
by reference to the normal-distribution law because distributions of 
averages of groups of measurements from non-normal distributions tend 
to approach the normal-distribution law as the numbers of measurements 
included in the groups increase [3]. This important fact will be considered 
more completely later. 


7-1 EXAMPLES OF ERROR DISTRIBUTIONS 

To further understand the meaning of a distribution of errors and the 
properties of these distributions, it is worth while to perform a simple 
experiment in which a normal distribution is usually found. It is desirable 
in an experiment for this purpose that accuracy be more easily achievable 
in the reading than in the setting of the instrument. We shall describe 
two such experiments here: one is suitable for laboratory use in a formal 
class; the other can be done by one person with very simple equipment. 

For the first one, set up a lighted object and a lens corrected for chromatic 
aberration on an optical bench so that the image of the lighted object 
will be between 20 and 40 cm from the lens. The experiment should be 
performed by two people, an observer and a recorder. The observer moves 
the screen until the image on it appears sharp to him and the recorder 
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estimates the position of the screen to the nearest yg mm and records it. 
The observer then moves the screen well out of focus and oscillates it back 
and forth to find a new position, which the recorder records in the same 
way. This procedure is repeated until the desired number of readings are 
obtained. For the result to be useful, it is desirable to obtain at least 500 
and preferably 1000 readings with equal care and skill. Ideally, of course, 
all measurements should be made by one person so that the same care 
and skill are applied to each reading. However, students in a class will 
usually be found to have nearly the same ability in focusing. Thus very 
useful results can be obtained by combining comparatively smaller num- 
bers of observations obtained by several observers. Each member of the 
class may act once as recorder and once as observer. It is necessary that 
every reading is recorded. Occasionally there will be a reading that seems 
quite far from the main group, and one may be tempted to conclude that 
it is a mistake and discard it. This should not be done, for reasons which 
we shall discuss later. 

We shall now describe a method of presenting the observations which 
is not only excellent for instructional purposes but is also frequently used 
for the formal presentation of scientific data. This is the construction 
of a histogram. We imagine that the scale on which the readings are taken 
is divided into intervals of equal length such that the locations of the 
division marks between intervals are specified to one more significant 
figure (with that figure being a 5) than is given in the readings. Thus if 
readings are taken to the nearest tenth of a millimeter, an interval may 



FIGURE 7-1 





54 


DISTRIBUTION OF CHANCE ERRORS 


[7-1 


lie from 688.25 mm to 688.75 mm, or from 688.25 mm to 689.55 mm. 
The intervals should be of such length that the one in which the maximum 
number of readings fall contains about 20% of the total. When all this 
has been decided, we count the number of readings which fall in each inter- 
val and calculate the corresponding fraction of the total. The fractions 
are then plotted as vertical bars at the proper intervals, where the height 
of a bar is proportional to the fraction, and the width of the bar is propor- 
tional to the length of the interval. A typical histogram is shown in 
Fig. 7-1. 

The procedure suggested for the less formal experiment is such that a 
histogram can be produced directly. The experiment is performed with a 
small reasonably heavy dart and a target which consists of 30 to 40 parallel 
lines ruled ^ in. apart on a piece of paper. The central line is marked zero 
and those to the sides marked 1, 2, 3, . . . and — 1, —2, —3, . . . The | in. 
spacing constitutes the interval in the abscissa for the histogram. 



The authors used a commercially available dart, which is sketched in 
Fig. 7-2. It should be well sharpened and a thread should be cemented 
against the shaft. Holding the dart by this thread when aiming it for the 
drop will produce a well-formed distribution more reliably than if it is 
held by the vanes. 

The experiment is performed by laying the target on a thick pad of 
newspaper on the floor with the lines stretching away from the experi- 
menter. It should be laid so that when the dart hits, it will hit over a 
place in the newspaper pad, near a fold, for instance, where there are air 
layers between the pages. If the dart hits a well packed pad, it will tend 
to bounce and the place of striking will be missed. By the time one is 
approaching the second hundred drops, this sort of behavior can be most 
exasperating. 

The reader should also be cautioned to always keep the algebraic signs 
of the coordinates in the target oriented in the same way. The dart is to 
be held by the thread above the target, and about 20 to 25 cm below the 
eyes while standing erect. With both eyes open, the experimenter aims 
the dart at the zero line and drops it. The interval in which the dart 
lands is the datum. One may well find that the peak of the distribution 
is to one side or the other of the zero line. This is the reason that mainte- 
nance of a constant orientation is necessary. 
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We repeat here even more strongly the earlier caution about the tempta- 
tion to discard the occasional widely dispersed readings. On the other hand, 
the dart may at times drop accidently from one’s nervous fingers before 
it has been aimed. Such a drop should not be counted. 

Finally, from time to time it will appear at first glance that the dart 
has landed exactly on one of the lines. Careful examination will provide 
some basis on which to decide in which interval to include the hit. 


7-2 CHARACTERISTICS OF AN ERROR DISTRIBUTION 

The following discussion applies to either of the two experiments above, 
as well as to some more involved ones to which the ideas being introduced 
here will be extended later. It is convenient to refer specifically to one of 
the above experiments, and we shall choose the second. 

The object of the experiment was to determine the likelihood of hitting 
what one aims at. If the reader is like the kind assistant* who performed 
this experiment with Fig. 7-1 as the result, he will find that he is not at all 
likely to hit exactly what he aims at. On the other hand, the histogram 
clearly shows that there is some chance of hitting xoithin certain bounds 
on each side of what is aimed at. The farther apart the bounds are, the 
more certain is one to drop the dart between them. Furthermore, it is 
clear that it is highly unlikely that a repetition of the experiment would 
reproduce Fig. 7-1 exactly. 

It is likely (though not certain) that the reader will find his distribution 
to be symmetric. If it is not exactly so, at least it will appear that it can 
turn out to be symmetric if more readings are taken. Although in most 
of the measurements that are made of physical properties the chance 
errors are symmetrical, there are types of measurements in which such 
symmetry should not be expected. A simple example will help to clarify 
this point. Consider the following modification of the optical experiment 
first proposed for illustrating the normal distribution of errors. Instead of 
keeping the lighted object and the lens fixed while observing the location 
of the screen for sharpest focus, we can keep the object and the screen 
fixed and observe the location of the lens. This experiment is most effective 
when the distance between object and screen is slightly greater than four 


* The authors would like to express their appreciation of the efforts of Jeanne 
Meisel. She dropped the dart well over the recorded 500 times. She took over 
the job as an inexperienced "guinea pig” to test the instructions given and to 
see whether she would produce a reasonably normal distribution. It is interesting 
and instructive to students that she had to learn to aim — her first 100 drops 
produced a wide, flat distribution. She started recounting when told that while 
any such distribution was worthy of discussion, hers was not what was wanted 
for the present purpose. 
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times the focal length of the lens; e.g., the object and screen being 1 m 
apart for a lens having a focal length of about 24.75 cm. Such a lens has 
two positions at which it makes a sharp image of the object : about 45 and 
55 cm from the lighted object. Obviously, the image will go out of focus 
more rapidly when this object distance is reduced below 45 cm than when 
it is increased above 45 cm. Careful observations of the lens position will 
show a greater spread of readings above than below 45 cm, and the distri- 
bution of chance errors will therefore be nonsymmetrical or skew. 

Following is a summary of the characteristics of an error distribution: 

(a) Not every distribution is symmetric. 

(b) The degree of certainty with which an experimentally measured 
quantity is known is itself uncertain, as is the quantity. 

(c) Every additional reading changes, however slightly, the picture of 
the distribution of the errors. 

We recall that (b) and (c) are restatements of the discussions in Chapter 
6, where we distinguished between the expectation values of the binomial 
and the Poisson distributions and the results one might get by averaging 
the observations from some finite number of experiments. When the 
distribution is not known analytically in the sense of Chapter 6, the ex- 
pectation value is not known. It is important then to emphasize this 
distinction between a hypothetically “true” value for a measured number* 
and the value which, on the basis of the readings at hand, appears to be 
most likely. It is convenient at times to be able to discuss the “true” 
errors, which are the differences between the readings and the “true” 
values. But since the “true” value cannot be determined with certainty, 
the only errors which can be handled numerically will be what are called 
residuals — the differences between the readings and the value that those 
readings indicate to be the most likely. 

There are two important implications in the above remarks. One is that 
the readings must be used to determine the best value of the quantity 
being measured or of some function of this quantity, as well as to determine 
the error distribution. If the “true” value were known, the readings would 
only have to be used for the determination of the error distribution. The 
error distribution cannot be determined as well in the first case as in the 
second since more information would be available in the latter. 


* The question of “true” value does not arise nearly so obviously in the dart- 
dropping experiment as in the lens image experiment or in an experiment using 
a caliper, for instance. But even in the latter cases, a consideration of the 
differing effects of the temperature on different metals, of the breadth and 
roughness, on a microscopic scale, of the scale division marks, etc., tends to 
make these “true” values almost as ephemeral as the value of the number one 
hits when he aims a dart at zero. 
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The other implication is that an error distribution must exist inde- 
pendently of the readings taken. The existence of such a distribution is 
less ephemeral than a “true” value of the quantity being measured. Things 
such as temperature or roughness of scales, which tend to make the 
existence of a “true” value unlikely, are just the ones that establish the 
existence of an error distribution. 

It is to be considered then that the readings taken are a sample of the 
infinite number that could be taken, just as one might check only a sample 
of wheat from a large storage bin. One does not examine every grain; he 
infers the general condition of most of the grains from appropriate 
samplings. 

It should be clear now why the reader was cautioned not to discard the 
occasional readings which seem so far from what they “should” be. Perhaps 
they are exactly what they should be for reasons pointed out in the dis- 
cussion of Table 6-5. There it was clear that the probability of observing 
some number of counts per second that departs significantly from the 
expectation value is low. Nevertheless there was a finite probability of 
observing these numbers of counts, and especially when a large number 
of observations is made, they are to be expected to turn up occasionally. 
The same reasoning applies here. 

In later chapters, we shall take up the matter of discarding readings, 
but we will show that this may be done only after careful consideration of 
all the readings. That is, the question of how improbable it might be that 
a particular reading belongs in the universe being sampled must be answered 
by an examination of the set which includes that reading. It may well 
be that such examination will indicate that it is highly improbable that 
the universe being sampled contains that reading, in which case it may be 
discarded. 

There is a related remark to be added here. When a series of observations 
has been made to determine certain quantities, it is desirable to know 
the uncertainty in those numerical values which are rendered most probable 
by the existence of those particular observations. With one unknown, it is 
sometimes assumed that the difference between the largest and the smallest 
readings is the only safe, and consequently, sensible measure of that 
uncertainty. If the reader has performed either one or both of the experi- 
ments described, he will see that this is nonsense. It is clear that this 
difference can never be decreased by added readings; it can only increase 
or remain the same by the addition of more work. The only logical con- 
clusion from such an assumption is to take only one reading on any par- 
ticular quantity. 

While we would always like to quote a greatest error, we will usually 
find this hope frustrated. But even if such an error were found, it would 
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be of little value if it was very large and of rare occurrence. What is of 
value is the shape of the error distribution curve, particularly within some 
reasonable range of its peak which might include from 70 to 90% of the 
number of readings that are taken. After some significant number of 
readings have been taken our knowledge of this shape is not altered 
greatly as the number of readings increases further. On the other hand, the 
addition of readings does continually increase the certainty with which the 
location of the peak is known. These qualitative statements will be made 
quantitative in the following chapters. 


PROBLEMS 

1. In the dart-dropping experiment which yielded the histogram of Fig. 7-1, 
it is supposed that the readings can be taken only to the nearest unit, so 
that all readings that fall between 5 and 6, say, are read as 5.5. 

(a) What is the probability that a single reading lies between the limits ±3? 

(b) What is the probability that a single reading lies outside the limits ±3? 

(c) What is the probability that four observations are —2.5, —0.5, +0.5, 
+2.5, in that order? 

(d) What is the probability that four observations are — 0.5, +0.5, — 2.5, 
+2.5, in that order? 

(e) Hence what is the probability that four observations have the values 
given in parts (c) and (d), regardless of the order? 

Answer: (a) 0.616 (b) 0.384 (c) 8.43 X 10" 5 (d) 8.43 X 10~ 5 

(e) 2.02 X 10- 3 

2. It is found that in a certain type of measurement the chance is 0.5 that any 
reading will fall in region A and 0.25 in region B. If three readings are taken, 
what is the chance that (a) they all fall in region B, (b) the first two 
fall in region A and the last in B, (c) they all fall in either region A or B? 

Answer: (a) (0.25) 3 (b) (0.5) 2 X (0.25) (c) (0.75) 3 

3. Perform an experiment similar to the dart-dropping experiment or the 
focusing experiment described in Section 7-1 and answer questions like those 
of Problem 1 above. 
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THE NORMAL-DISTRIBUTION 
FUNCTION 


It was mentioned in the previous chapter that the histogram is excellent 
for instructional purposes. Its appearance suggests the existence of some 
continuous mathematical function, called the (real) error-distribution 
function, which describes the probability of occurrence of particular 
readings. This function /(£) has the property that the probability that 
the error corresponding to a particular reading will lie in the interval 
between x and x + Ar is 

rx+Ax 

<?(x <(< x + Ax) = / /(I) d(. (8-1) 

J X 

This probability is represented by the shaded area in Fig. 8-1. 

8-1 THE NORMAL -DISTRIBUTION FUNCTION 

There are many conceivable forms of error distribution, as illustrated in 
Chapters 6 and 7. It is our object in this chapter to find and examine the 
properties of a distribution function that will describe the results of 
experiments having error distributions similar to that of the dart-dropping 
experiment of the previous chapter, i.e., distributions having the general 
form shown in Fig. 8-1. The principal properties of this form suggested 
by the histogram of Fig. 7-1 are symmetry of the distribution about zero 
error and the improbability of very large errors. As discussed in Chapters 
6 and 7, this does not mean that very large errors can be known to have no 
probability of occurrence but only that they are very improbable. The 
smooth curve described by the mathematical form, or function, to be 
discussed in this chapter is called the normal- or Gaussian-e rror curve, 
and errors which are described by it are said to be normally distributed. 

It is generally assumed that there are real error distributions which 
have the form to be given here. There have been experiments designed 
to verify this assumption [4]; those of the preceding chapter were in fact 
so designed. But the verification is valid only within the bounds of another 
assumption, that the distribution obtained with very large numbers of 
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fig. 8-1. The solid line represents 
the error distribution function; i.e., 
the ordinate gives the probability 
per unit error, as a function of the 
error, for the occurrence of an error 
of the size of the corresponding ab- 
scissa. Thus, the shaded area will 
represent the probability that an 
error will be found to lie between x 
and x + dx. 


m 





readings approaches the “true ” distribution. Although satisfactory agree- 
ment is generally obtained in such trials, it must be pointed out that 
the working physical scientist usually cannot afford the time, nor does he 
have the inclination, to go through the boring task of proving that his 
errors do follow this distribution by always taking 500 to 1000 readings. 
However, the consequences of assuming that they do are generally safe 
and certainly safer than those of making some highly unlikely assumptions, 
of guessing at some measure of the error distribution function, or of doing 
nothing at all about describing the error distribution. 

Various arguments have been used as a basis for a “derivation” of the 
normal-error distribution. Each of them is open to some sort of criticism, 
particularly by adherents of other methods or, especially, of other distribu- 
tions. Some of the derivations [5] are based on assumptions which are 
attempts to reflect physical conditions. Such a derivation arrives at the 
desired result with the consequence that the best value for a series of 
measurements on a single quantity is the arithmetic mean. Other deriva- 
tions [6] accept the latter as an axiom. Many authors prefer to adopt the 
normal error distribution without derivation, as the result of experience. 
A derivation will be given here ; the only claim we make for it is that it is 
illustrative o/ the arguments customarily used. 

It was emphasized earlier that not all observations follow the normal- 
error curve. Hence we might question the validity of attempting to imagine 
and use a “model of physical conditions” in a derivation of the normal- 
error curve. Actually it is enough to demonstrate that there are measure- 
ment procedures that yield data which follow the normal law with satis- 
factorily high probability, and that it cannot be shown that most of the 
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data which are actually collected in everyday practice do not follow such 
a distribution. Consequently, to restate what we said earlier, the normal 
curve represents as good an approximation as is generally available. 

The characteristics of the relationship between an experiment and the 
error distribution affecting that experiment were described in Chapter 6, 
and in particular, in Section 7-2. It is assumed that the error distribution 
in question exists independently of the particular set of readings which 
are taken, and that the (P’s of Eq. (8-1) have a priori but unknown values.* 
Since the (P’s are unknown, we can only attempt to derive a function that 
will closely describe the readings obtained and has the properties men- 
tioned. The derivation will be based on an attempt to associate the 
experimental readings with a distribution having the general form of 
Fig. 8-1 in such a way as to make the existence of the set of readings at 
hand more probable than for any other set of that size. This means that 
we must deduce /(£) in the form of an equation which will accomplish this 
purpose. 

It is clear that this form of /(£) implies the existence of a most probable 
value which is the same, in this case of a symmetric function, as the 
expectation value. Since the latter is not known, we let it be represented 
by M 0 , to be derived from the n readings Mi, i = 1 , ,n, and let 

Vi = Mi — M 0 , (8-2) 

called the residuals, represent the true errors. It is the necessity of deter- 
mining M 0 from the values of M t - that represents the loss of information 
referred to in Section 7-2. While it is not known that M 0 corresponds to 
the origin of the horizontal scale of Fig. 8-1, the difference between two 
residuals, being just the difference between the corresponding readings, 
is nevertheless the same as the difference between the unknown true 
errors which correspond to those readings. Therefore increments on a 
scale of v are the same as increments on a scale of £. Let Av (=A M) be so 
chosen that there is no more than one of the total of n readings in a single 
interval of width Av. In Section 6-3 it was shown that the probability of 
the occurrence of n independent events is equal to the product of the 
individual probabilities of occurrence of the individual events. Thus the 
probability of obtaining a particular set of n observations can be written 
as 


p = n 


'Vi+Av/2 


Vi — Av/2 


/(f) dl, 


(8-3) 


* Not only is the real error distribution unknown, but so also is the real error 
corresponding to a particular reading. 
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where the notation 


n 


denotes the product of the factors with i = 1, i = 2, etc., up to i — n. 
For each i = 1 , ,n, the integral represents the probability of obtain- 

ing one of the readings so that Eq. (8-3) is the probability of observing 
the complete set. In the limits of integration, Vi is the abscissa at the 
middle of its interval. 

If /(£) is expanded near £ = y; by the use of Taylor’s theorem, it 
becomes 


m = m + 


df 

d£ 


s , 1 d 2 f 

a - + 2 


(£ — Vi ) 2 + 


When this expansion of /(£) is used as the integrand, we find that, for 
small Ay, powers of At; greater than 3 are so small that they can be omitted. 
Then Eq. (8-3) becomes 

p = n f( y i) Av • 

i=i 


Here, a factor Ay appears for each/(y z ) ; there is no need for every interval 
to contain a reading. 

In this discussion we are attempting to deduce the form of the function / 
that will yield a prediction that P is a maximum for the readings ob- 
tained. This problem is different from the determination of those residuals 
— that is, the determination of the value of M 0 — that will produce a 
maximum value of P when / is known. 

The value of In P reaches a maximum when P does, and In P is more 
convenient to use. Thus 

n 

In P = ^ In f(vi) + n In Ay. (8-4) 

i= 1 


In this expression f(vi ) varies with i. We wish to know what form of 
dependence of / on its argument will make In P a maximum : that is, for 
what value of ln/(y;) is 5 In P, the variation in In P, zero. The value of 
In f(vi) varies with varying values of V{. Thus 


and we wish to have 


8 In f(vi) 


1 df 

f dv 


8v 


D 


v i 


V 1 df 
■ , f dv 

i=i J 


8vi = 0 . 


(8-5) 


If the individual residuals were all independent of each other, this 
requirement could be easily fulfilled by individually setting each of the 
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8v{, except 8vj say, equal to zero and then setting (l/f)(df/dv)\ Vj equal to 
zero, a process that could be carried out for j — 1 , ,n. However, the 

value of Mq , whatever it may be, is common to all the residuals, so that 
they are not all independent. Thus the variations considered above must 
be carried out in a manner consistent with the constancy of some other 
relation between them. It is easy at this point to also include another 
requirement, that the distribution function be symmetrical. If f(v ) = 
f(—v), we can describe the situation by considering / as a function of an 
even power of v, we choose v 2 . Thus the variation in the individual values 
of In / must be carried out to satisfy 

X>? = c, 

i= 1 

where C is some constant. Therefore, in addition to Eq. (8-5), the varia- 
tions in the residuals must also satisfy the equation 

n 

^2 Vi 8 vi = 0. (8-6) 

i— 1 

It is certainly true that if / depended on any even power of v, it would 
be symmetric about v = 0. The second power is the simplest that satisfies 
the requirement ; more importantly, it also leads to the normal distribu- 
tion, and as we mentioned earlier, most experiments not known to obey 
some other distribution such as the Poisson have been found to obey the 
normal distribution when very large numbers of readings have been 
taken in order to test this point. 

Suppose now that Eq. (8-6) is solved for the variation in one of the 
residuals, 8 v\ for instance. Then 

s _ v 2 x V 3 s 

OV i — — — 8 v 2 — — 8 v 3 — • • • 

Vi Vi 

If we substitute this into Eq. (8-5), we obtain 


1 df 
f dv 



8v 2 H — — 8v 3 

Vi 


I d l , I df 

f dv v / V2+ fdv 


8v 3 + • • ■ — 0. 

”3 


Since one of the variations, 8v\, has been eliminated corresponding to the 
one fixed relation which must exist between the 8vi (because M 0 is com- 
mon to all the residuals), the remainder of the variations are independent 
of each other. Thus the above equation can be satisfied by setting the 
coefficients of the various 8v{ equal to zero individually. Therefore we 
have 

J df _ J_ df 1 df _ _1_ df 

vj dv Vl v 2 f dv v f f dv Vl v 3 f dv v f 


etc. 
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But these relations say that regardless of the particular residual v i} i > 2, 
for which it is evaluated, (vf)~\df/dv) must always equal some constant, 
which happens to be written here as (vf)~ l (df/dv) evaluated for v x . It is 
convenient to rewrite this constant as —2 h 2 . We then conclude that the 
function / must have a form such that 


1 df 
f dv 


—2 h 2 v. 


It is clear that the constant must be negative in order that 


(8-7) 


(P(v < £ < v + Ay) 

decreases as y 2 increases; reference to Fig. 7-1 or Fig. 8-1 shows that 
when y is positive, df/dv is negative, and vice versa. 

Integration of Eq. (8-7) yields 

/ = ke~ hV , (8-8) 


where k is an arbitrary constant of integration. 

Equation (8-8) will represent a smooth, continuous mathematical func- 
tion if the same constant of integration can be used to relate / and y 2 for 
every y 2 . That distribution function which is called the normal - distribution 
function of real errors is then the function of a; 2 which results when the 
same value of k is assumed to apply to all observations; that is, the normal 
distribution function is 

/ = ke~ h2x2 (8-9) 


where k, like h 2 , is a fixed constant for the distribution. 

We remember from Chapter 6 that the sum of the probabilities of every 
conceivable event covered by either the binomial or the Poisson distribu- 
tion must be unity. The same thing must be true here. The only difference 
is that, with this smooth mathematical function integration replaces the 
process of summation used previously, and as we see from Eq. (8-9), 
the range of events is from — oo to +oo . Therefore 



( 8 - 10 ) 


Since this integral can be evaluated, we can obtain a relationship between 
the constants k and h. In Appendix 1 it is shown that Jq e~ t2 dt is equal 
to y/w/2. Hence 


21: Vir 


h 
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Thus the normal-distribution function can be written as 

/(*) = 4 = <r ‘ v < (8 - l2) 

V 7T 

which now contains only one adjustable constant h which determines 
the precision of the observations. Figure 8-2 shows graphs of the distri- 
bution function for different values of h. We note that the ^-intercepts 
are h/y/r. Therefore the maximum heights of the curves are proportional 
to h. Since the area under each curve is one, the curves having high 
intercepts on the y-axis are very narrow corresponding to a very narrow 
distribution of errors, whereas those with low y-intercepts have widely 
distributed errors. The number h is called the “measure of precision” 
since a large h means a narrow distribution of chance errors. 



It was mentioned in the introductory remarks to the above deduction 
that the most probable value of an observation was the same as the expec- 
tation value when the distribution is symmetric. We demonstrated this 
earlier with a symmetric binomial distribution; we can also demonstrate 
this here. The importance of this fact is emphasized by recalling that the 
expectation value is equivalent to the arithmetic mean. 

In Chapter 6 we obtained the expectation value for an analytically 
known distribution by multiplying an event, or a reading, by the proba- 
bility of its occurrence and taking the sum of this product over all possible 
events. In the present case, summation means integration. If M 0 is the 
most probable reading, then some other reading is 

Mi = Mq -f- x i 
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since by definition is the true error in Then the expectation value ja 
is 

/ oo 

(M 0 + x) e~ h2 * 2 dx. (8-13) 

Because of the symmetry of the distribution, the integral involving x as a 
factor is zero; the positive values just cancel the negative values. From 
Eqs. (8-10) and (8-11) then we obtain 

M = M 0 . (8-14) 

Figure (8-2) and the accompanying discussion pointed to h 2 as a measure 
of the width of this distribution; but as with the binomial and Poisson 
distributions, we shall defer further discussion of this most important 
topic until Chapter 9. Here we shall confine ourselves to an illustration. 


8-2 NORMAL DISTRIBUTION: AN ILLUSTRATIVE 
APPLICATION 

Either of the two experiments described in Chapter 7 will serve as a direct 
illustration of the normal-distribution function. We shall use the data 
given in Fig. 7-1 in the following comparison; readers should similarly 
work up their own data. 

First, we note that the height of each bar in the histogram of Fig. 7-1 
represents the probability that the dart will fall between two adjacent 

X 2 



FIGURE 8-3 
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lines separated by a distance Ax, which is unity in this case. It has been 
frequently mentioned that one can never expect to define a distribution 
exactly with a finite number of readings. Anticipating the discussions of 
Chapter 10, we must fall back on Eq. (8-13) to proceed with the illustration; 
i.e., we consider all the observations which fell in a particular interval to 
have fallen at the center of that interval, and the ratio of the number of 
such observations to the total, 500, to be the probability of obtaining an 
observation in that interval. When we take the sum according to Eq. 
(8-13), of the products of the readings and their probabilities of occurrence, 

(0.5) (0.096) + (1.5) (0.110) H f (-0.5)(0.140) + (-1.5)(0.108) -\ , 

we find that the expectation value is —0.782. Thus the error at a reading 
of 6.5, say, is 7.282. 

For purposes of illustration, we will assume that the true distribution 
has been demonstrated by the 500 readings that were taken and that this 
distribution is normal. Thus we will use the designation x, which was 
reserved for the true error, to describe what are actually residuals. We are 
therefore to examine (?(x < £ < x + Ax), which it is convenient to call 
AP. Here, Ax is sufficiently small for our purpose that we can write* 

A P = f(x ) Ax = — e~ h x Ax. 

V 7T 

As in our deduction of the form of the normal distribution we find it most 
convenient to use the equation 

In AP = Infix) = ln-^z - h 2 x 2 , (8-15) 

V 7T 

since ln(A:z) = 0. The data are plotted according to this equation in 
Fig. 8-3. Before discussing the considerable scatter which appears in the 
figure, we shall make some further rudimentary numerical analysis — 
rudimentary since we have not yet introduced the proper procedures for 
calculating those values of the parameters which are indicated by the 
data to be most probable; the parameter in this case is h 2 . 

The intercept on the In f(x) axis is clearly about — 2.1, which according 
to Eq. (8-15) should be the value of ln(/i/\/7r). When this equality is 
assumed, the value of h 2 turns out to be 0.047, which is the negative of the 
slope in Eq. (8-15). This was the method used to find the line drawn in 
Fig. 8-3 ; it is a pretty fair representation of the data so long as x 2 is not 
too large. 


* The correlation of the observed values of AP with f(x) will be made more 
precisely in Chapter 9. 



68 


THE NORMAL-DISTRIBUTION FUNCTION 


[8-3 


We can assume that the considerable scatter is to some extent evidence 
supporting the often repeated warning that no finite number of observa- 
tions can determine an analytic distribution exactly. In this case, it is 
likely that a more important source of the scatter is that there is no fixed 
analytic distribution governing the observations. Certainly the observer 
did not make all 500 observations at one time, and variations in attentive- 
ness to aiming the dart, the state of her digestive processes, etc., all tended 
to shift the distribution. In fact, resting between groups of observations 
may well have produced a more sharply defined distribution than if all 500 
observations had been taken at one time. 

Special attention should be paid to the fact that there is more serious 
departure from the straight line at large values of x 2 . The values of 
ln/(x) for large x are seen to lie above the line; this points to another 
problem, that of fractional observations which, of course, do not exist. 
When n observations are made under the "control” of a distribution/^), 
the number of observations expected in an interval Ax at x is nf(x)Ax, 
for small Ax. If this number turns out to be 0.4 at some x, and an observa- 
tion falls in that interval, then the specification of a distribution being 
deduced experimentally is in error by a factor of 2.5 at that point. This 
error cannot be corrected unless an additional 1.5n observations are made 
without another occurring at that point. We shall discuss this problem 
further in later chapters, where the data of Figs. 7-1 and 8-3 will serve as a 
continual source of examples. 

8-3 POISSON DISTRIBUTION FOR A LARGE 
EXPECTATION VALUE 

Comparison of the Poisson distribution (P(6.5, r ) shown in Fig. 6-3 with 
the histogram of Fig. 7-1, the idealization of the latter shown in Fig. 8-1, 
and the normal distributions of Fig. 8-2 yield some suggestive similarities. 
The distribution of Fig. 6-3 is reasonably symmetric for a small region 
about the most probable value of r, and the latter value is close to the 
expectation value 6.5. In this section we wish to examine these similarities 
by making an approximation to the Poisson distribution for a large expec- 
tation value. Since the 'principal emphasis of this book is on the estimation 
of expectation values for observations which are expected to follow the 
normal distribution, we shall make the approximation to the Poisson 
distribution as though the result were expected to become symmetric 
about the expectation value as the latter became larger. That property of 
the Poisson distribution which will prove this hope to be false could be 
demonstrated immediately, but we believe that the following order of 
presentation is more instructive. 

While not absolutely necessary at this point, it is convenient to use and 
accept without proof a formula obtained by more advanced mathematical 
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methods than are generally being used here, 
large r, 

r! = r T e~ T \ / 2'wr 


This formula says that for 

(8-16) 


is a good approximation [7]. 

As stated in the preceding discussion, the region of our interest is 
around the expectation value. We shall keep it in view even while trans- 
lating the origin to n, by letting 


x — r — n. 


(8-17) 


Then x, which is the difference between a particular reading and the 
expectation value, plays the role of the error. If Eq. (8-17) is used in 
Eq. (6-6), the latter becomes 

fi ~\~X 

<***>“ (8_I8) 

This is to be approximated for the case where /u is a large number but x 
is not. In particular, the approximation is not expected to be good for 
negative values of x with absolute values comparable to fx. Under the 
allowed circumstances, /x + x will also be large, and Eq. (8-16) can be used 
in Eq. (8-18). After some obvious algebraic manipulation the latter 
becomes 

m \ +x — , 

M + ^/ Vl + x/n 

which can be further reduced to 

* / i \m+*+1/2 

As is frequently the case, it is more convenient to work with In (P; 


(P(m, x) = ( 

V27T/X \ 


In (P(m, x) = — ^ In 2tth + x — (^jl + x + 0 In ^1 -f ^ • (8-19) 

If fx is large, then x/n is small, and the last term can be expanded as follows : 



When this result is inserted into Eq. (8-19), the leading x is cancelled, 
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and the result is 

ln<PO*,:r) = -|ln(2x^ — — 4 | 

X 

+ terms involving - raised to powers of 2 and greater. 

M (8-20) 

We see that if we set 


h 2 



( 8 - 21 ) 


the first two terms of (8-20) represent a normal distribution. Furthermore, 
the terms in the higher powers of x/fi can be neglected when fx is large 
and x <K ju. However, the term \x/n presents a problem when compared 
with hi for any of the smallest allowed values of x, i.e., for small 
integers x. 

The problem arises from the fact that, for the Poisson distribution, the 
most probable value does not continuously approach the expectation 
value as the latter becomes large; the distribution does not become sym- 
metric to a sufficient extent that, as in the normal distribution, there is a 
single most probable value of r which becomes coincident with the expec- 
tation value. We can see this by a more careful examination of Eq. (6-6) . 
Let us suppose that r 0 is the single most probable value and that it is 
different from n by an amount e: r 0 = /i + e. By the definition of r 0 , 


(P(r 0 =b 1, /x) < d*(ro, fi). 


Then 


(P(r 0 + 1, m) = 


, U )+1 


It is required that 


(r 0 + 1)! 
r 0 + 1 

M 

r 0 + 1 


= 


M /° e -n 


ro + 1 r 0 \ 


®(ro, n). 


< 1, 


that is, n < n + € + 1, or 


Also, 


(P(r 0 — 1, m) 
Furthermore, 


0 < € + 1. 




Tq— I 


(To ~ 1)1 




To — n 

li r Q \ 


^<1, 


(8-22) 


77 (P(ro, ix). 


or /x + e < ju, or 


€ < 0. 


(8-23) 
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The inequalities (8-22) and (8-23) can only be satisfied when 

-1 < € < 0. 


Now 


<Km, m) = 


M- 


p 1 e 11 
M (M — !)• 


and since it has just been shown that p — 1 


= <Km — l, m), 

< < M, we choose to try 


ro = M — i’ 

We now change the definition of the “error” from Eq. (8-17) to 

y = r — r 0 = r — p + £ = » + $, 
so that we can substitute 

x = y — l 

into Eq. (8-20) and ignore the higher order terms to get 
In <P(ji, y) = in (2 ttm) - I, + ^ • 

The term 1/8/i represents a negligible error in the total probability. If 
(P(p, y) is integrated over all values of y from — oo to + oo , the result is 
c 1/8m rather than unity. Since p is very large in this discussion, this result 
is only different from unity by about 1/8/i. Furthermore, it can be shown 
that a slightly improved version of Eq. (8-16) will lead to a reduction of 
this last constant term to 1/24/i. Thus, the Poisson distribution for large 
values of /i, when written in terms of the difference between the observations 
and the most probable observation, approximates a normal distribution 
about this most probable observation with a precision index whose square 
is the reciprocal of twice the expectation value of the Poisson distribution.* 

PROBLEMS 

1. Equation (8-3) gives the probability of obtaining that particular set of 
distinct readings that were found. Show that if this probability had been 
written for the same readings, but without regard to the order in which they 
were obtained, the arguments of Section 8-1 would have led to the same 
result. 


* One sometimes sees the above discussion conclude with p replaced by ro. 
Careful analysis with the improved version of Eq. (8-16) mentioned above shows 
that the result obtained here yields a slightly more accurate value for the total 
probability, but in either case the difference is negligible when p ^>> 
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2. The line drawn in Fig. 8.3 has the equation 

In fix) = -2.1 - 0.047a; 2 . 

By using this instead of the actually observed probabilities from the histo- 
gram of Fig. 7.1, answer the questions of Problem 1 of Chapter 7. Note that 
x here is the error as described in Section 8-2. 

Answer: (a) 0.631 (b) 0.369 (c) 1.088 X 10“ 4 (d) 1.088 X 10~ 4 

(e) 2.61 X 10~ 3 

3. If one had a distribution defined by the values of /( 2.5), /(1.5), etc. found 
in part (a) of Problem 2 and / = 0 for all other values of x, what would be 
the expectation value? Note that the values of / are now only relative 
probabilities. 

Answer: n = — 0.193 

4. Apply the arguments of Section 8-3 directly to the binomial distribution 
for p = q and large expectation value n, to find the distribution approached 
under these circumstances. Keep terms only up to a; 2 as in Section 8-3. 
Answer: (P = (l/s/ivy.) 



CHAPTER 

9 


MEASURES OF SPREAD 


When we make a series of observations of some quantity, our principal 
object is to determine its most probable value or its average value; as we 
have seen, these two values are not always the same. It is also desir- 
able, however, to be able to describe the degree of certainty with which 
these values are known. This degree of certainty is determined by some 
measure of the width of the probability distribution, i.e., the spread in 
the observations. Several quantities have been devised as measures of 
spread, and we shall describe them in this chapter with the aid of ana- 
lytically known distributions. These quantities are defined as numbers 
which a reading has some specified probability of exceeding. This proba- 
bility may be 1 in 100, 1 in 20, 1 in about 3, or even 1 in 2. 

The last-mentioned measure of spread, a number that has one chance 
in two of being exceeded, was very popular a few years ago and is still 
used by a number of investigators. This number is known as the probable 
error, which is an unfortunate choice of name because it suggests to the 
uninitiated that this is the most probable error. But although such an 
interpretation leads to confusion, it is not desirable to change the name 
here since it is found in so much of the literature. 

When one considers measures of spread, it is important to distinguish 
between the spread of the individual readings and the uncertainty of the 
most probable value or of the average or other result to be determined. It 
was pointed out in Section 7-1 that averages of groups of readings tend 
to be normally distributed. This means that if readings are picked at ran- 
dom from a normal distribution and collected in groups of equal size, the 
averages of these groups will be found to be normally distributed. In fact, 
even if the infinite parent distribution is not normal, the averages of groups 
of individual observations collected at random tend to be more and more 
normally distributed as the numbers of readings in the groups increase. 
The distribution function for these averages will always be narrower than 
that for the individual readings. If h is the constant in the distribution 
function for individual readings, then h 0 , the constant in the distribu- 
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tion function for the averages, will be larger than h. Measures of spread 
are needed for both of these distributions. We shall consider the distri- 
bution for individual readings in this chapter. 

By far the most important of the measures of spread is the standard 
deviation. It will be described first; it will also be the only measure of 
spread described for distributions other than the normal one. 


9-1 THE STANDARD DEVIATION 

The standard deviation a is best defined by the following equation : 

<r 2 = lim . (9-1) 

n— >°o n 

That is, <r is the square root of the average of the squares of all the errors 
described by the analytic distribution; in particular, these are errors 
measured from the expectation value. 

The expression for a 2 in a binomial distribution can be deduced by the 
same method as used in Chapter 6 to deduce the expectation value p 
np ). If the error is p — r, then the probability of occurrence of this 
particular error is C(n, r)p r q n ~ T . As described in Chapter 6, the average 
of a particular quantity for an analytically determinable distribution need 
not involve division of the sum over a frequency distribution by the 
number of observations. The limiting result for this latter process is 
obtained by summing or integrating the product of the quantity to be 
averaged and the distribution, over the range of the distribution. Hence 


0-2 = Yj (m - r) 2 C{n, r)p r q n r , 

r=0 

which becomes 

o- 2 = M 2 — 2 ju ^ rC(n, r)p r q n ~ r + ^ r 2 C(n, r)p r q n ~ r 

r=0 r=0 


when we expand (p — r) 2 . The first sum which appears in the above 
expression is seen to be p. By applying the procedure used in Chapter 6 
to the second sum we see that 


Z 


y = 0 


y + l 


n(Z!) 

(i ~ vV- 


pV+'q l -y 


y_ n(l\) 

^ 0 y ] ( l - y ) 1 


p y+1 q l 1 


+ M- 


Repetition of the procedure on the remaining sum shows it to be 

np{lp) = np 2 (n — 1). 
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When all terms are collected, we find that 

<j 2 = fx 2 — 2 n 2 + (x 2 — np 2 + /z = np(l — v) — n PQ, (9-2) 

or for the binomial distribution, 

a = \/npq. (9-3) 

We recall from Chapter 6 that the Poisson distribution was obtained 
from the binomial distribution by considering the case where n became 
very large and p very small while the expectation value np retained some 
fixed, finite, nonzero value. Hence we can determine a 2 for the Poisson 
distribution by examining Eq. (9-2) under these circumstances. The 
first three terms, each of which has a definite, finite value, mutually cancel. 
By the postulated conditions, the term 

np 2 = (np)p = /ip 

goes to zero. Hence for the Poisson distribution, 

a = \frvp = VP- 

In the case of the normal distribution, integration replaces summation, 
but otherwise the procedure is exactly the same. That is, 



In Appendix 2 we show that the value of the integral is a/tt/4 h 2 . 
Hence 


,2 __ _ 1 _ 

2h 2 


1 0.707 

or a = = — r 

h\/ 2 h 


(9-4) 


While we have not discussed the rectangular distribution, it is easy to 
insert here an evaluation of the standard deviation for this not uncommon 
case. The rectangular distribution is one in which no readings occur 
outside certain limits, ±d/2 say, but all readings within these limits are 
equally probable. Then 


or 



d 

2VS 
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9-2 PROBABLE ERROR 


The probable error may be defined as that error which divides the area 
of the positive side of the normal-distribution curve into two equal parts. 
To state it in another way, there is a 50% probability that an observation 
will produce an error of absolute value less than the probable error. For 
the normal distribution the probable error p can be defined by the integral 





2 t2 



(9-5) 


From this definition, we can obtain p as a function of h. Changing the 
variable as in Appendix 1 to let t = h%, we obtain 



which can be integrated by expanding the exponential in a power series. 
Since 

2 3 

U 1 I . 'll . 'll - 

e ~ 1+w+ 2! + 3!" H ’ 

we have 


e~ l = 1 


and 


rhp 

/ e~ t2 dt = hp 
Jo 


A ,6 

t 2 + - — 4- 

+ 2! 3! ' 


M! , W 

o T” 


V 7 T 


2!5 


This series converges very rapidly so that an approximate value can be 
obtained from a small number of terms. We can use a method of suc- 
cessive approximations by writing the equation as 


hp 


Vjr _i_ W 3 

4^3 


(M 5 , 

2!5 ^ 


) 


where the first approximation is hp = \/7r/4. A more exact value may 
be obtained by substituting this first approximation into the terms on 
the right-hand side of the equation. Thus we find that 


hp = 0.4769, 


V = 


0.4769 . 
h ’ 


(9-6) 


the probable error is inversely proportional to the precision measure h. 
From Eqs. (9-4) and (9-6) we find that the relation between p and cr 
is 


p = 0.6745(7. 


(9-7) 
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9-3 AVERAGE DEVIATION 


Another measure of spread is sometimes used because it can be calculated 
rather quickly, although it is itself subject to large error. This is the 
average deviation a.d. defined as 

A v ZW 

a.d. = lim — — • 

n — *oo W* 


In other words, it is equal to the average of the absolute values of all the 
errors. To avoid the difficulties involved in taking absolute values mathe- 
matically and since the curve is symmetrical, we make the calculation 
only for the positive side of the curve, i.e., from x = 0 to x = oo. The 
area under the distribution in this latter range is just half that under the 
entire curve. Thus the average deviation is given by 

> oo 

a.d. = 2 / x -~z e~ h * 2 dx. 

do V 7T 

Since d(h 2 x 2 ) = 2 h 2 x dx, the average deviation can be written as 


a.d. - -~= / ° 

hs/ 7T d 0 


e ~ h 2x2 d(h 2 x 2 ) = 1 


hy/ 7r 


0.5642 

h 


(9-8) 


From Eqs. (9-8) and (9-4) we find that 


a.d. = 0.7979(7. 


(9-9) 


9-4 MEANING OF THE MEASURES OF SPREAD 

The probable error has been defined only for the normal-distribution 
function. It is the error which divides the area under the distribution 
function into two equal parts so that there is a 50% chance that any 
reading will have an error exceeding the probable error. It would be 
desirable to have this same kind of probability information on the other 
measures of spread. Actually it is only for a continuously defined, sym- 
metric function such as the normal function that one can obtain such 
fixed, easily described information. Before going on to obtain this informa- 
tion for the normal-distribution function, we shall use the examples of 
Chapter 6 to illustrate the situation with such distributions as the binomial 
and the Poisson functions. 

From Fig. 6-1, which describes the probability of getting various 
numbers of heads, say from 0 to 10, in a toss of 10 coins, we see that the 
most probable number and the mean are identical at 5, which is a possible 
result of a toss. The probability of getting 5 ± 1, that is the probability 
of getting a 4 or a 5 or a 6 is 0.6563. With this distribution, which is 
defined only at discrete points, it is not possible to have a result p such 
that the probability of getting some result within the range p =b p is 
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In like manner, a = VTO/2 is nonintegral so that p ± cr is not a possible 
result. Since 1 < <j < 2, the value 0.6563 is the probability of getting 
a result within the range p ± <r, but this particular value of the probability 
will change if n is changed. 

With the symmetric distribution discussed above, cr has at least some 
real usefulness; the probability of getting a result between p and p + a is 
the same as the probability for the range p — cr to p. With pronouncedly 
skew distributions, even this is not true. In the case of Fig. 6-2, p = 
10(£) is not integral and so is not a possible result. Furthermore, the 
probability of getting a result between p and fx + 1, say, is 0.29 while 
that of getting a result between p — l and p is 0.32. Since a (= \/10(£)(f)) 
is 1.18, these probabilities will turn out to have the same numerical value 
if the upper and lower limits of the range in question are replaced by a. 

In this particular example, the distribution resembled the Poisson 
distribution since p was rather small. In Section 8-3 it was shown that the 
Poisson distribution for large p approached a normal distribution about 
the most probable value, the standard deviation, as seen from Eqs. (8-20) 
and (9-4) , being still defined in terms of /x. Such consideration emphasizes 
the skewness of the binomial distribution of Fig. 6-2. 

In the Poisson distribution of Fig. 6-3, however, cr is again useful even 
though p is not very large. Considerable symmetry about the most 
probable value 6 is evident. For this distribution <j = \/6^5 = 2.55. 
The probability of occurrence of an observation within 6 ± 2.55 includes 
the probabilities for observations 4 through 8, which add up to 0.680. 
The probabilities for 4 and 5 counts add up to 0.257 and those for 7 and 
8 to 0.265, sufficiently near each other to support the appearance of 
symmetry in Fig. 6-3. If, however, the probabilities farther out on both 
sides are included, skewness becomes much more noticeable. The sum 
for 3, 4, and 5 is 0.326 and that for 7, 8, and 9 is 0.351. 

In line with the discussion of Section 8-3, it is obvious that this numerical 
symmetry will be lost if one discusses probabilities within the range 

p ± (X. 

The binomial distribution has two parameters, n and p. Since q = 
1 — p, q is known when p is known. The Poisson and the normal distri- 
butions each have only one parameter, p for the former and h 2 for the 
latter; a could be used for either of these. Distributions which involve a 
single parameter are, of course, much easier to use than those which 
involve more than one parameter. It is fortunate that a large majority 
of the physical measurements which we do make can be handled ade- 
quately by a single-parameter distribution. The discussion of the Poisson 
distribution in Section 8-3 and in this section shows that even here the 
conditions under which one can approximate it by a normal distribution 
are not unduly severe. Properties of the normal distribution are copiously 
tabulated, so that we shall restrict our more general discussion of the 
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determination of the probabilities of obtaining observations within 
arbitrary limits to the normal distribution. 

With this distribution, we can determine the probability of an error 
lying between zero and any specified value X by integrating the distribu- 
tion function from x = 0 to x = X : 


(?(x < X) = [ — e h2x2 dx. 
do V 7 r 

Tables could be calculated to give values of &(x < X) for various values 
of X and h. This would be a large task, and to be useful the tables would 
have to fill a great many pages. However, if the integral is written in terms 
of a new variable y = x/cr, then we can eliminate h so that we need only a 
relatively small number of integrations. Thus 


(P(x < X) = / 

Jo 

since h 2 = l/2<r 2 , and hence 


-~e~ h2x2 dx = 

V7T 


(?(x < X) = — L / e 
V2tt j o 


-(1/2)(*/«t)2 




0 <r\/27r 


1 


dx 


7 = / « v * l2 dy, (9-10) 

V27t* / o 


where t = X/cr. We will refer to (P(x < X) as Px- 

For small values of t, we can evaluate Eq. (9-10) by expanding e~ y2> 2 into 
a series as we did in Section 9-2. For larger values of t, other methods 
must be used. Table A at the end of this book gives values of Px for 
values of t = X/cr between 0.0 and 5.9. 

For t = 1.0 Table A shows that Px = 0.34134; i.e., there is a 34.13% 
chance of an error landing between zero and -per. Since there is also a 
34.13% chance of an error landing between zero and —a, the chance that 
the absolute value of an error will be less than a is 68.27%, and the chance 
that the absolute value will exceed a is 31.73%. This probability of 
68.27% that an error will fall in the range y. ± cr should be compared with 
the corresponding 65.6% found for the binomial distribution of Fig. 6-1 
and the 68.0% chance that an observation will lie in the range r 0 ± u 
for the Poisson distribution of Fig. 6-3. 

When a given experimental result is being reported, it is common 
practice to use either ±2 cr or ±3<r as the limits of reliability. From 
Table A it is found that Px is 0.47725 for 2 a and 0.49865 for 3<r. The 
chance that the absolute value of the error will exceed the 2<r limit is then 
4.55% (or less than 1 in 20), and in the case of the 3cr limit it is 0.27% 
(or less than 1 in 370). Limits wider than 3cr are seldom, if ever, used for 
this purpose. 
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FIGURE 9-1 


9-5 AN ILLUSTRATION 

The histogram of Fig. 7-1 is reproduced here as Fig. 9-1 with some addi- 
tions. Before describing these additions, we shall caution the reader again. 
The results of the dart-dropping experiment, given in the histogram of 
Fig. 7-1 and used to illustrate properties of the normal distribution in 
Fig. 8-3, are only illustrative. It must be clear to the reader that a repeti- 
tion of that experiment will not lead to exactly the same values for the 
mean and the standard deviation, so that we cannot say that the experi- 
ment has provided any analytically determinable distribution governing 
the relative frequency of the various observations. The histogram 
approached a normal distribution sufficiently closely to allow it to be used 
for these various illustrative purposes. Thus the mean M of the observa- 
tions is being used here as though it were known to be the expectation 
value of an analytically determinable normal distribution. 

In the same way, the value of h 2 (=0.047) which was found from 
Fig. 8-3 in a descriptive way is used here with Eq. (9-4) to determine a 
value of a, 3.26, with which we shall continue the illustrations. It is 
important to realize next that the ordinates of a distribution function 
were not obtained in this experiment. The numbers that were found were 
relative values of shaded areas like the one shown in Fig. 8-1. For com- 
parison, such values of A P, calculated as though they were governed by a 
normal distribution with p = —0.782 and a = 3.26, are shown as crosses 
at the centers of the various intervals in Fig. 9-1. To illustrate the pro- 
cedure, we shall describe the calculation of the value plotted at TO. 5. 
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This is to be the probability of finding an observation in the 0 to 1 range. 
Since /z is —0.782, the error at 0 is +0.782, and that at 1 is +1.782. The 
value of the variable t of Eq. (9-10) is 0.240 at 0 and 0.546 at 1.0. From 
Table A, by linear interpolation between the areas given at 0.55 and 0.56, 
we find that the probability of there being a result between t = 0 and 
t = 0.546 is 0.207. The probability of finding a result between t = 0 
and t = 0.240 is 0.095. Consequently, the probability of finding a result 
between 0.240 and 0.546 is the difference, 0.112, which is the value plotted. 

It was seen in Fig. 8-3 that large residuals occurred with higher proba- 
bility than should be expected, were the distribution normal. With 500 
observations of this sort, we can observe probabilities only in intervals of 
5 ^o, or 0.002, and we can never realize an expectation of say 0.001. If 
the expected probability is 0.001 and we make 500 observations, we can 
never get closer than zero or 0.002 to the “correct” probability even if 
the observations are controlled by a known normal distribution. 

Let us assume that the value 3.26 used above for a is correct for the 
normal distribution assumed to be controlling the experiment; this is a 
reasonable value since it was determined by points in a range of probability 
where is a relatively small fraction of the observed probabilities. If 
we now compare this with the a calculated from the definition of Eq. (9-1), 
neglecting that part of the definition which says n —> 00 , we find that it is 
smaller than (%2x 2 /n) 112 by 0.21. It appears then that in this trial there 
was a small excess of observations at large deviations. We will refer to 
this excess again in Chapter 10. 
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METHOD OF LEAST SQUARES 


The discussion of error or probability distributions has so far been con- 
fined to analytically determinable distributions. As in so much of science, 
the function of such discussions is not so much utilitarian as instructional. 
It is only rarely that the parameters of a particular distribution are known. 
In fact, they are never known when measurements of basic scientific 
interest are being made; if they were, there would be no point to making 
the measurements. Thus, out of the previous discussion we must extract 
some method of handling the distribution of a finite number of chance 
errors which will, hopefully, yield results close to reality. 

It is fortunate for the physical scientist, as opposed to the medical, 
biological, or social scientist, that the vast majority of his measurements 
appear to be drawn from distributions which have comparatively small 
standard deviations. Thus for him a relatively small number of observa- 
tions will define the distribution, and hence its mean, sufficiently closely 
for practical purposes. Furthermore, the physical scientist is usually able 
to proceed satisfactorily on the assumption that the parent distribution 
governing the distribution of readings in his experiment is normal, or 
Gaussian. The methods to be described in this chapter are also based on 
this same assumption. 

These methods, moreover, will be considered applicable regardless of 
the number of readings involved ; there is not much else to do unless it is 
to report each reading. Even if this is done, however, it should be the re- 
sponsibility of the physical scientist always to make some commonly used, 
definitely describable, and exactly repeatable estimation of his error dis- 
tribution. The method of least squares satisfies all these requirements. 

10-1 FUNDAMENTAL PRINCIPLE 

Since an error distribution can never be determined exactly by a finite 
number of readings, it is never possible to determine the true value of any 
quantity measured. One has to be content with that value which the 

82 
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available set of observations indicates to be most probable. As defined 
by Eq. (8-2) the residuals, which are the only available substitutes for 
the errors, are the differences between the readings and this most probable 
value of the quantity in question. 

To find the most probable value M 0 from a set of observations, say 
Mi, M 2 , . . . , M n , it is first necessary to know what one means by the 
“most probable value.” The following definition has been accepted by 
most authorities. The most probable value that can be obtained from a given 
set of observations is the one which makes that set of observations most 
probable * 

Thus it is necessary to calculate the probability of obtaining this set of 
readings and then find the maximum for the expression. In this process 
the quantity M 0 must be considered as the variable since the readings 
Mi, M 2 , M 3 , . . . , M n are now fixed, known quantities. According to the 
principles set down in Section 6-3, the probability P of obtaining this set 
of readings is the product of the probabilities for the individual readings. 
Thus 


where 


P = APi • AP 2 • * * A P n , 


A Pi - (P(Mi — p < x < Mi — p + Ax). 


If, as assumed, the A P t are distributed normally, then 

h -tfx? 


AP, 


V 


— e 

T 


Ax. 


Therefore the probability of getting this set of readings is 

P = (A=)" (AX)” exp (-A 2 £ xf) . 

The method of least squares consists in estimating the expectation value 
of this distribution by substituting residuals for the errors in the argument 
of the exponential function 


Vi ~ Xi 

and adjusting M 0 until P reaches the maximum. This process does not 


* Here we are using a technique called the method of maximum likelihood, 
which is just what the name implies; out of all the possible values of the param- 
eters necessary for the evaluation of the probability of the existence of a set 
of observations, we search for those which make the probability a maximum. 
The method is more pointedly used and discussed than it is here in those texts 
which are specifically oriented toward an examination of the foundation of 
statistical theory. See, for instance, Arley and Buch [8] or Lindgren [9]. 
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affect h, n, or Ax since, for example, Ax is merely an interval along the 
abscissa of the distribution, and it is immaterial whether this interval is 
called Ay or Ax. Since the argument of the exponential contains only 
squared terms, the negative sign makes the exponent definitely negative. 
Moreover, the residuals Vi are the only variables, so that the greatest 
value for P is obtained when the exponent is as close to zero as possible. 
The exponent cannot be zero unless all of the Vi are zero. Therefore the 
maximum value for P is obtained when £ v f is as small as possible. This 
is the principle involved in the method of least squares. Note, however, the 
more fundamental idea of finding the most probable value of the measured 
quantity by making the existing readings the most probable readings. 
This latter notion has a general applicability; the method of least squares 
applies specifically to the normal error distribution. 

This proof holds not only for the simple case just treated, in which 
only one unknown was measured, but in cases where any number of 
unknowns are to be determined. In the particular example given, the 
method of least squares merely shows that M 0 is the arithmetic mean of 
the observations. We have 


E» 2 = (M, - M 0 ) 2 + (M 2 - M 0 ) 2 + --- + (M n - Mo) 2 
and for Ec 2 to be a minimum it is necessary that 

a Ea 2 


dM o 
Thus 


= —2(Mi - Mo) - 2 (M 2 - Mo) 2 (M n - M 0 ) = 0. 

(10-1) 


iif Mi + M 2 + • • • + M n 
Mo = 

In cases where there is more than one unknown, the results are more 
complex. 


10-2 AN EXAMPLE OF LEAST SQUARES WITH MORE 
THAN ONE UNKNOWN 

Frequently in science, observations must be made where more than one 
quantity is unknown. A simple example in which a set of observations is 
made on more than one unknown is the determination of the acceleration 
due to gravity g by direct observations of the position of a freely falling 
body at known instants of time. In the Behr free-fall apparatus,* a metal 
bob falls along and very close to a long strip of wax paper fastened to a 


* R. L. Edwards, Am. Phys. Teacher (Now the Am. J . Phys.), 1, 6 (1933). 
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vertical metallic surface. The bob has an annular knife edge. A vertical 
metallic bar, insulated from the metallic surface, is also mounted parallel 
and close to the path of the bob. The vertical bar and vertical surface 
are connected as electrodes to a device which produces strong sparks at 
the end of specified time intervals. The time intervals are usually very 
accurate because they are determined by the frequency of the a-c supply. 
Each spark jumps from the bar through the bob, and then through the 
wax paper to the metal surface. At the instant it jumps, the spark leaves 
a fine pin hole in the wax paper at the location of the annular knife edge 
on the bob. 

Experiments of this type are performed in most elementary physics 
laboratories. It is a mistake to analyze the results by assuming that the 
bob starts from rest at the beginning of the first interval and that the 
positions of the bob at the end of each subsequent interval are given by 
S = \gt 2 . Actually, being held and released magnetically, the bob usually 
starts at some unknown time so that it has some unknown velocity at the 
time of the first spark following release. Thus there are three unknown 
quantities, g, S 0 , and v 0 , where the latter two are the initial position and 
initial velocity, respectively. 

The proper way to observe the position is to place the strip of waxed 
paper on a flat table and lay a meter stick along this waxed paper in such 
a manner that one can observe on the meter stick the positions of the pin 
holes; one should not measure individual distances between pin holes.* 
We may label these observations Si, <S 2 > S3, etc. From theory we know 
that the S’s should satisfy the expression 

S = Sq + vq t + \gt 2 , 

where S 0 , v 0 , and g are unknown. While in this problem we may be 
interested only in the value of g, we cannot ignore the fact that S 0 and v 0 
are also unknown. 

The values of t, measured from zero at the first pin hole, are assumed 
to be known accurately. Actually, chance errors will also occur during the 
"observations” of t. However, when the timing is done by means of the 
local a-c power source, which is used for controlling electric clocks, we can 
consider the time as so accurately determined that these errors are 
negligible. Fortunately, it is true in a great many experiments in the 
physical sciences that one of the observed quantities is determined so 
much more accurately than the others that chance errors in this particular 
quantity can be ignored by comparison with those in the others. In what 
follows we shall consider only those cases in which one type of observation 
is in error. 

* E. M. Pugh, Am. Phys. Teacher (Now the Am. J. Phys.), 4 , 70 (1936). 
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We can write a set of observation equations, 


So + v 0 t 1 + %gt\ — Si, 
So + ^ 0^2 + ?gt2 = S 2 , 


So + i>o t n + \ gtn — S n , 

where the symbol = should be read, “is observed to be equal to.” To 
illustrate the solution of this by the method of least squares,* let us 
consider a special problem involving seven observations with the Behr 
free-fall apparatus in which t\ = —ST, t 2 = — 2 T, t 3 = — T, £4 = 0, 
£5 = T, to = 2 T, and 1 7 = ST, where T is the period between sparks. 

The corresponding observations are &_ 3 , $_ 2 , #S x, S 0 , Si, S 2 , and S 3 . To 

simplify the form of the observation equations it is desirable to let 
%gT 2 = A, v 0 T = B, and $0 = C, where A, B, and C are now the 
unknown quantities. It is very important to note that each of these 
latter equations contains only one of the original unknowns. The observa- 
tion equations then become 

94 - SB + C S — 3 , A + B + C = 2 = Si, 

4 A - 2 B + C = 2 = S_ 2 , 44 + 2 B + C S 2 , 

A — B -f- C == aS_i, 94 -J- SB -J- C ==. S 3 , 

C ± S 0 . 

Usually, values of 4, B, and C cannot be found which will make these 
genuine equalities because the two sides of the equations differ by small 
residual errors. The residuals are 

y _ 3 = S — 3 - (94 - SB + C), 

v _ 2 = S — 2 - (44 - 2 B + C), 

Vs —S 3 — (94 -j- SB + C ). 

Thus 

Zv 2 = {S -3 - (9.4 - 3B + C )} 2 + ■ • ■ + {S 3 - (94 + ZB + C)} 2 . 

To obtain the most probable value of the unknown quantities 4, B, and 
C, we must adjust each one separately. If we differentiate with 
respect to 4 and set the resulting expression equal to zero, we will ensure 


* Various trick definitions and substitutions are used in what follows. It is well 
to acquire facility of this sort since it makes for convenience and brevity during 
algebraic manipulations. 
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a most probable value for A. However, we must also perform the same 
differentiation with respect to both B and C to obtain their most probable 
values. Differentiating with respect to each of these unknowns separately 
and equating the results to zero, we obtain three different equations, 
called “normal equations”: 

0 = = -2 X 9{S_ 3 - (9 A - 3B + C)} 

-2 X 4{£_ 2 - (4A - 2B + <7)} > 

0 = ^ - = 2 X 3{£_ s - (9 A -3B + C)} 

+2 X 2 {aS — 2 — (4 A — 2 B -(- (7)} -f- • • • > 

0 = ^ - = -2{S_3 - (9 A - 3B + C)} 

-2{£_ 2 - (4A - 2B + C)j - • • • 

With these we can easily solve for the unknown quantities. 

We should interrupt our discussion at this point to emphasize one aspect 
of what has been done. In this problem, we began with seven observation 
equations but only three unknowns. But by the application of the method 
of least squares this overdetermined problem, with more equations than 
unknowns, has been converted to one with the same number of equations 
as there are unknowns. 

When we divide each of the three normal equations by —2 and collect 
terms, we obtain the following result: 

196.4 + 28 C = 9(S 3 + $—3) + 4($2 + S_2) + ($1 -f- $_i), 

28 B = 3(S 3 S— 3) + 2(S 2 S — 2) + ($1 — <8— 1), 

28 A + 7(7 = ($3 + *S_3) + ($2 £—2) 0$ 1 + >8_i) + *S 0 - 

Since A is the only unknown that is desired, we solve the first and third 
of these normal equations by multiplying the third equation by 4 and 
subtracting it from the first, obtaining 

a 5OS3 jU) — 3(<Si + S-i) — 4<8p 

A ~ 84 

Since A = \gT 2 , 

5O83 + <S_3) — 3(<$i -f- <S_ 1) — 4<$ 0 

9 ~ 42T 2 

If we wish, we can also calculate B and C from the three normal equations. 

Note that this problem was made much simpler by using symmetrical 
coefficients and subscripts because several quantities which would ordinarily 
have appeared in the normal equations were made to equal zero by this 
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device. Many problems we encounter in using the method of least squares 
can be simplified by similar devices; a good example is the fitting of data 
by polynomials, which is discussed by Birge [10]. 

10-3 GENERAL SOLUTION FOR LINEAR UNKNOWNS 

The example in Section 10-2 is typical of many problems in science; it 
contains several unknowns which enter linearly into the problem. Since 
there are numerous physical problems with this characteristic, and since 
even if the unknowns do not enter into the observation equations linearly, 
methods can be employed which will make them linear, it is worth while 
to solve the linear problem in general terms. Before proceeding to the 
solution, however, we must discover at which point the linearity enters 
the problem. The problem at hand is the determination of values of the 
“constants” in the observation equations. In the example of the previous 
section, the experimental variable t entered nonlinearly, but this did 
not affect the linearity of the least-squares problem since all the t’s and 
t 2 ’ s were known from direct observations. The unknowns in the problem 
were g, v 0 , and S 0 , which entered the equations linearly. 

Consider then the general problem in which q unknowns are to be 
obtained from n observation equations, n being always larger than q. 
Let A, B, ... ,Q be the unknown quantities (coefficients of the experi- 
mentally determined or assigned independent variables in the observation 
equations) whose most probable values are to be determined. Let the 
observation equations be 

Aa i + Bb\ + • • • + Qq\ = Mi, 

Aa-2 -t - Bb2 • • • H - Qq2 = At 2 , (10—2) 


Aa n + Bb n + •••-)- Qq n = M ny 

where the M’s are the observations and the a’ s, b’s, etc., the experimentally 
known variables. Note, for instance, that b x could be af , etc. up to q x = a\, 
and the least-squares problem would still be linear. 

As before, we must calculate the sum of the squares of the residuals 
and differentiate this quantity with respect to each of the unknowns to 
obtain q normal equations. From these q normal equations we can deter- 
mine the q unknowns. 

The sum of the squares of the residuals is 

£ v 2 = {Mi — (Aa x + Bb x + *••-{- Qqi)} 2 

{At 2 — (Aci 2 + B &2 + * ■ * + Qq 2 )} 2 + • • * (10-3) 

+ {M n — ( Aa n + Bb n + • • • + Qqn)} 2 , 
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from which, for instance, 

— = — 2ai{M\ — ( Adi + Bb\ + • • • + Q?i)} 

2a 2 {M 2 (Aa 2 -j- Bb 2 -f- • • • + Qq 2 )} (10-4) 

— • • • 2 a n {M n (Aa n + Bb n + • • • + Qq n )} — 0. 

When Eq. (10-4) is divided through by —2 and the terms are collected, 

the result is 

[aa]A + [ab]B + • • • + [aq]Q = [aM], (10-5) 

where by definition 

7i n 

[aa] = ^2 [a&] = 2 etc - 

i= 1 i= I 

Equation (10-5) is called the normal equation for A since it is obtained 
by differentiation with respect to A. Similarly, we can obtain normal 
equations for B and the other unknowns. We then have q equations in q 
unknowns, 

[aa] A + [ab]B + • • • + [aq]Q = [aM], 

[ab] A + [bb]B + b [bq]Q = [ bM ], (10-6) 

[aq]A + [bq]B + • • • + [qq]Q = [qM], 

which can be solved by any of the common methods for simultaneous 

equations. This general solution for linear observation equations is a 
relatively easy preparation for solution by the method of least squares 
when the equations can be put into the form of the general equations 
(10-2). We only need to carry out the summations indicated by the 
brackets and form the normal equations. 

For anyone faced with the necessity of doing this work entirely by hand, 
we have a convenient arrangement for calculating the sums needed in the 
general solution; it is shown for the case of two unknowns A and B in 
Table 10-1, where the observation equations are written on the left-hand 
side and the sums calculated by filling the columns to the right and 
adding. Fortunately, many of the summations are repeated in the normal 
equations so that much labor is saved by computing each sum only once. 

When a desk calculator is used, the sums should be accumulated as the 
products are formed; the individual products are not needed. Thought 
should be given to obtaining maximum efficiency. For example, if the ai 
are all unity in Eqs. (10-2), then 


[ab] = £6 and [66] = £6 2 . 
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Table 10-1 



aa 

ab 

bb 

aM 

bM 

Aa x + Bbx ~ Mi 

a? 

ai&i 

b\ 

aiMi 

biMi 

Aa 2 + Bb 2 = M 2 

a \ 

a 2 b 2 

% 

a2^2 

&2^2 

Aa n “l - Bb n = M n 

al 

drfin 

b 2 n 

a n M n 

b n M n 

Sums 

[aa] 

[ab] 

[bb] 

[aM] 

[bM] 


Most desk calculators can be set to accumulate these two sums simul- 
taneously. Similarly, in the present case, 

[aM] = LM and [bM] = L&M. 

If one is careful in entering the values of M consistently in the proper 
way, one can also obtain these sums simultaneously. 

One may think that an automatic computer is ideal for this job — the 
whole business can be programmed so that the machine will put out the 
least-squares values of the unknowns, the errors in them, and any other 
information that is desired and which can be obtained from the available 
data. Indeed, this is all true, but remarkably enough, there is a precaution 
that must be observed regarding the number of digits carried in computa- 
tions. Warning was given in Chapter 2 that while solving simultaneous 
equations, one must assume that the coefficients of the unknowns are 
exact. But differences will develop in any solution which can be known 
sufficiently accurately only if large numbers of digits, in line with the 
assumption that the starting numbers are exact, are carried along. It 
has been roughly true that the larger and faster the automatic computer, 
the smaller the number of digits it can store at a single memory location, 
a failing that is being corrected in the newer computers. The authors once 
saw least-squares calculations of exactly the sort being discussed here, 
i.e., with three unknowns, go badly awry on an IBM 704 for just this 
reason! Correct answers were obtained when a single number with twice 
as many digits was stored in two memory locations. After mentioning this 
example, let us emphasize that the user of a desk calculator must watch 
this same problem. Every possible digit should be carried, and the results 
must be examined with a critical eye. 

This admonition often applies to the generation of the [ aM ], [aa], etc., 
as well. The safest rule here, contrary to Section 3-1, is to carry at least 
as many digits in every operation as fit on a 10 -bank calculator. The least 
check one should make is to see if the calculated curve goes through the 
data as described in Section 4-3. 
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Whether with the aid of a desk calculator or an automatic computer, 
we can use any of a number of methods to solve the normal equations. 
Frequently, as in the particular problem with the Behr free-fall apparatus 
just solved, the method of solution is obvious. We can, however, give a 
general solution by using the determinantal notation. For example, the 
second unknown B can be given by 


[aa\ 

[aM] 

[ac] . 

. [aq] 

[ab] 

[671 H] 

[be] 

[ [bq] 

[aq] 

[qM] 

[cq] . 

• [qq] 

[aa] 

[a6] 

[ac] . , 

• • [aq] 

[ab] 

[66] 

[be] 

[bq] 

[aq] 

M 

[cq] • • 

■ • M 


A similar expression can be written for each of the other unknowns. 

The use of determinants can be very tedious when the number of un- 
knowns is greater than three; in such an event the reader might well find 
other methods more attractive. One such method* is described in 
Appendix 9; if it is to be used exactly as described, the designations of 
the unknowns should be such that 

Ml < ibi < ••• < lei, 

as well as one can judge, with the order of the equations maintained as 
shown in Eqs. (10-6), where the first one is the normal equation for A, 
the second is the normal equation for B, and so on. This is accomplished 
most conveniently by starting with the unknowns arranged in the above 
order in the observation equations (10-2). 


10-4 LEAST -SQUARES FITTING OF A STRAIGHT LINE 

It was pointed out in Section 4-3 that one should learn as much as possible 
about the fitting of a straight line to (appropriate) experimental data. 
We shall describe in this section the computational methods to be applied 
when there is no a 'priori reason to suppose that the data are not all equally 
worthy of consideration. 

Various notations are used to describe the general straight line, par- 
ticularly in the literature on the subject under consideration here. The 


* Another useful method is Jordan elimination. See, for instance, E. Bodewig, 
Matrix Calculus (New York: Interscience, 1959, p. 107). 
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most common of these are 
as used in Section 4-3, and 


y = a + bx, 


y — rax + b. 


There is much confusion between y = a + bx and the equally common 

y = ax + b. 

In all these equations x is the independent variable, whose values are 
assigned during the experiment, and y is the observed dependent variable. 
A specific example would be the motion of a body moving along a scale 
with constant velocity v. If at time t = 0 the body is at s 0 on the scale, 
its subsequent positions s will be given by 

s = s 0 + vt. 

Generally one can read a “clock” more accurately than a scale position. 
Hence the latter is treated as the quantity to be observed at assigned 
values of t. 

Notations tend to become somewhat confusing in this work because 
the experimental variables become the least-squares constants, and vice 
versa. In the above example, s and t are observed and therefore are treated 
as constants, while s 0 and v must be treated as variables since their most 
probable values are being determined. When the computations for this 
section are discussed, 

y = A + Bx 

will be the form used. Here, numerical values of y and x are known as the 
result of the performance of an experiment; A and B are the quantities 
whose numerical values are to be determined by, hence are the variables 
for, the least-squares maximizing process. 

In the general straight-line equation, the coefficient of x is called the 
slope, and the constant term the intercept. A plot of s vs. t for the specific 
example would then have v for its slope and s 0 for its intercept. 

We next note that the number of quantities to be determined by the 
method of least squares, two, is the same as in the example of Section 10-3 
to which Table 10-1 applies. Indeed, we can cast the observation equation 


Aa\ “I - Bb\ = M i 

into the form of this section by dividing by cq and letting b\/a\ play the 
role of Xi and M\/a\ the role of y\. On the other hand, the reader should 
be cautioned to be sure he knows what he is doing before making such a 
transformation. For instance, it is implied by the original observation 
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equation that Mi is the quantity actually observed in the experiment, 
not Mi/ ai. This point is important, and later we shall look into the 
reason why when we discuss observations of unequal weight, where the 
division by aj changes the weight of MV Note that in the specific example 
of the moving body, it is actually s which is observed at given values of t. 
Since the case where the observation equations are of the form 

y = A + Bx 

is one of the most common, it is desirable to restate it. Suppose that when 
x has the value X{, y is observed to have the value y^ If the quantity 
A + Bxi is calculated by using some pair of values for A and B, then the 
difference between this and the observed value is 

A + Bxi — yi. 

We can apply the method of least squares by squaring this number and 
those obtained for all other values of i, summing these squares, and 
adjusting A and B until this sum is a minimum. This last operation is 
done with the aid of calculus, as before. Thus 

2 ^ vi = ( A + Bxi — y^ 2 , 

i i 

= 0 = 2 Y.U + Bxi- Vi ), (10-7) 

i 

= 0 = 2 Xi(A + Bx i - Vi). (10-8) 

i 

Since J/iA means the addition of as many values of A as there are values 
of i, and if we let i = 1 , • • • , n, then Eq. (10-7) becomes 

nA + B ^2 Xi = yi, 

i i 

and Eq. (10-8) becomes 

A^2 X{ + B^2 x 2 = 2 x iVi- 
i i i 

These are to be solved for A and B. The result is 

A = A_1 [(S ~ (S x »’)( X) x iVij\ ’ (10-9) 

B = A -1 1 n ^ x iV }j - ^ XfVj] y}j j > (10-10) 

X XX 



94 


METHOD OF LEAST SQUARES 


[10-4 


where 

A = n x 2 i - x^\ . 

i ' i ' 

An example is shown in Table 10-2. In order to be sure that he has a 
complete understanding of the procedures, the reader should verify all 
the sums and solutions shown in the table for the values of Xi and yi 
given. If a desk calculator is available, it can be used in the manner 
suggested in the previous section. 


Table 10-2 


Xi 

Vi 

= 2.1 

2 >? = 2.19 

0.1 

5.9 

ZVi = 135.5 

I >,-y< = 143.87 

0.7 

42.0 

A = 

2.16 

1.3 

87.6 

A = -2.49 

B = 68.08 


A fairly common special example of the two-parameter equation is the 
case where the values of x are evenly spaced. For instance, in the Kundt’s 
tube experiment* the wavelength of sound in the tube is determined by 
measuring the distance between the equally spaced nodes in the cork dust. 
The only sensible way to make this measurement is to place a meter stick 
along the tube and note the position of the nodes on this meter stick, as 
was described for the Behr free-fall experiment because in this way, if an 
error in observation is made at one point which tends to increase the 
distance between one set of nodes, this same error will tend to decrease 
the distance between an adjacent set of nodes, thus somewhat com- 
pensating for the error. Naively, one might expect that the best way to 
obtain the spacing from the suggested set of measurements would be to 
subtract adjacent readings to obtain the approximate spacing and then 
to average the results thus obtained. However, if one does this algebraically, 
he discovers that the average obtained is 

, (Vr — Vr-l) + (y r - 1 — y r - 2 ) H + (y-r+l — y-r) 

$av — 1 

n — 1 

Vt — y-r 
n — 1 

The final result obtained by this procedure retains only the two end 
observations and does not take advantage of any of the accuracy that 
should be available from the other observations that have been made. 
The proper procedure is to consider the observations as representing a 


* See, for instance, F. W. Sears and M. W. Zemansky, University Physics, 
Reading, Mass.: Addison- Wesley, 1964, p. 503. 
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straight-line plot of y vs. x, where y is the reading on the meter stick and 
x is the ordinal number of the node being measured. Then the slope 
represents the spacing desired. 

We shall consider two cases. If there are an odd number of readings, let 
y for the central one be called y 0 , let the spacing between values of x be 
called d (=1 for the Kundt’s tube example), and let there be r readings 
on each side of the origin. Then 2r + 1 = n, the number of readings, 
and 

Vi = Vo + Bid, —r < i < r. 


With the aid of some hints given in Appendix 3 we can show from 
Eq. (10-10) that 

d _ r (^r — y~ r) ~T • • • 3(2/3 y— 3) -f- 2(?/2 y~ 2) H~ l(?/i — y~ 1) 

- 2d{r 2 + (r - l) 2 H f- 3 2 + 2 2 + l 2 } 

= Ej=i i(y* — v-i) 

2d[Ei=i i 2 \ 

Note that the central reading does not appear. 

In the other case, if there are an even number of readings, let the origin 
be half-way between the two central readings. Then 


and n 


Vi = 


Vi = 


Vo + B 
Vo + B 


( 2 i - l)d 
2 

( 2 i + 1 )d 


2 r. Thus we have 


0 < i < r, 
—r<i< 0 , 


D _ (2^ l)(yr ~ Z/-r) + • • • + 3(^2 — V—2) + (j/l ~ V— l) 

d[(2r - l) 2 4 h 3 2 + l 2 ] 

and all the readings appear. 


10^5 OBSERVATIONS OF UNEQUAL WEIGHT 

Thus far in the treatment of least squares we have assumed that all the 
observations have the same precision measure. But it frequently happens 
that some observations are made with considerably greater precision than 
others. A good illustration is found in an early version of the Behr free- 
fall experiment,* where the position of the falling body is recorded by 
means of a vibrating tuning fork fixed to the falling body which makes a 
wavy trace on the waxed paper. The peaks of the waves, instead of the 


* R. L. Edwards, loc. cit. 
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pin holes, determine the positions. When the body is moving slowly, the 
waves are close together, and the peaks are very sharp so that positions 
can be determined with considerable accuracy. When the body is moving 
at higher speed, however, the peaks spread out into long waves, and it is 
much more difficult to determine the exact location of the peaks. In this 
case, the uncertainty in the observations made with the body traveling 
at high speed is large compared with the uncertainty for the observations 
with the body moving at low speed. Examples of weighting, including 
this one, will be discussed later. 

Where the observations are of unequal precision, we cannot consider 
them as coming from the same infinite parent distribution of errors. The 
errors of each of the observations must be characterized by different 
values of the measure of precision h in the equation of the normal-distri- 
bution function. 

Assume that a set of n readings have errors Xi, each drawn from a 
distribution with a different precision index hi, but all with the same 
expectation value. Then, as in Section 10-1, the probability of obtaining 
the set is 


P = 



(Ax) n exp 


£ 

i=l 


As before, the common expectation value is estimated by replacing the 
Xi by Vi and adjusting the latter until P reaches its maximum. This 
value for P is obtained when 2 Z(h 2 v 2 ) is as small as possible; the most 
probable values of the unknowns are obtained when 21 ( h 2 v 2 ) is a minimum. 
This is the most general statement of the method of least squares. 

Of course, since we have not yet discussed any procedures for estimating 
the precision measures or, equivalently, the standard deviations for the 
various distributions, the equation is not yet good for numerical work. 
We shall discuss these procedures in Section 10-8. 

A more useful form of the equation can be found by letting 


hi = Wi h 2 , h 2 = w 2 h 2 , . . . , h 2 = w n h 2 , (10-11) 

where h 2 is some constant; it could be the least of the h 2 with a corre- 
sponding w of unity. The statement of the method of least squares then 
becomes 

h 2 (wiv\ + w 2 v\ + • • • + w n v 2 ) = minimum. (10-12) 

To see what this expression means, let us consider the case of observations 
on a single unknown in which 


V\ — R\ — M, 


v 2 — R 2 — M, . . . , 


v fi — R fi JS'l . 
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Here M is the most probable value of the unknown quantity and should 
be considered as a variable until its best value has been determined. 
Then 


Iw 2 = w 1 (R 1 - M) 2 + w 2 (R 2 ~ M) 2 + • • • + w n {R n - M) 2 . 


This expression is to be differentiated with respect to M, and the result 
set equal to zero: 

0 = - M) - 2 w 2 (R 2 -M) 2w n (R n - M). 


Solving this for M gives 

-ur W\R\ -f- w 2 R 2 + • • • + w n R n 

M = j : ; — y 

W\ -\-'V) 2 + • • • + w n 

which is the common form used in obtaining weighted averages. This 
shows that the w’s, introduced with Eqs. (10-11) without any other 
definition, are in fact the weights of the observations, and the quantity h 
represents the precision measure of the observations of weight unity. 

Equations (10-11) enable us to obtain a very important relation between 
the weights and the precision measures, namely, 


ki _ w i 

hl~ W 2 ’ 


(10-13) 


which shows that the squares of the precision measures are directly 
proportional to the weights of the corresponding observations. Con- 
versely, the squares of the measures of spread are inversely proportional 
to the weights of the observations. 

Note that the repeated appearance of a particular observation increases 
its weight accordingly. This can be seen by referring to Eqs. (10-11) and 
(10-12). If, from the universe characterized by h, the residual Vy is drawn 
W\ times, the residual v 2 drawn w 2 times, and so on, it is clear that the 
complete sum of squares of all the residuals will be exactly the sum in 
Eq. (10-12). 

It is now possible to set down some very important relationships 
between precision measures, weights, and measures of spread. By com- 
bining Eqs. (9-4), (9-6), (9-8), and (10-13), we obtain the following 
equations : 


hy _ wi _ ^2 _ P2 _ a.d. 2 
h 2 \w 2 cri pi a.d.x 


(10-14) 


If h and <r are respectively the precision measure and the standard devia- 
tion for the distribution of individual observations, and if h 0 and <r 0 are 
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the precision measure and standard deviation for the distribution of 
averages of random samples of n of these readings, then from Eqs. (10-14) 
we find that 

1,2 2 
h 0 _ n _ a 


or 



(10-15) 


This result will be more rigorously demonstrated in Chapter 12. 

Let us now return to the more general case, which is also governed by 
Eq. (10-12) ; that is, a set of observation equations like those in Section 
10-3 may frequently result from observations which are by no means of 
equal precision, or it may be possible and convenient to convert the 
actual observations to a linear form where the M ’ s are functions of the 
actual observations and so have weights different from the latter. In such 
cases the equations should be treated by making a minimum as 

indicated by Eq. (10-12). Otherwise the procedure is exactly as in Section 
10-3. The resulting normal equations are 


[waa] A + [wah]B + • • • + [waq]Q = [waM], 

[wab] A + [whh]B + • • • + [wbq]Q = [wbM], 

: ( 10 - 16 ) 
[waq]A + [wqb]B + • • • + [wqq]Q = [wqM], 


where, for instance, 

71 

[ waa ] = ^2 Wial. 

i— 1 

The result again is a set of simultaneous equations in q unknowns, and the 
solution is just as before. Examples will be given in the following sections. 


10-6 CONDITION EQUATIONS 

It sometimes happens that a set of observations of several unknown 
quantities must satisfy exactly one or more theoretical conditions existing 
between the unknowns. For example, when we make measurements of 
three angles A, B, C of a plane triangle, the observations must satisfy 
the theoretical condition A + B + C = 180°. With this condition 
equation we can reduce the number of unknown quantities from three to 
two. There are then three observation equations in two unknowns, and 
we can apply the method of least squares. When there is more than one 
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condition equation between the unknowns, each condition equation can 
be used to eliminate one of the unknown quantities. Thus we can reduce 
the number of unknowns by the number of condition equations. 

Let us consider a specific problem. Assume that observations on three 
angles of a triangle have been made with the following results: 

Angles Weight 

A =2= 42.31° 2 

B = 2 , 75.89° 1 

C = 62.13° 1 

We can eliminate C from the third observation equation by means of the 
condition equation and obtain 

— A — B = -117.87°. 

The resulting three observation equations in A and B can be solved by 
the methods already given, but the amount of arithmetic can be con- 
siderably reduced by a device which is useful in many problems with the 
method of least squares. First, we find approximate values for the un- 
known quantities. Then we consider a set of new unknowns as the dif- 
ference between these approximate values and the most probable values 
of the unknowns. 

In this problem, we let 

A = 42.31° + z u B = 75.89° + z 2 . 

Substituting these values in the observation equations and using the 
method of calculating summations given in Table 10-1, we obtain Table 
10-3. The resulting normal equations are 

3zi + z 2 = —0.33, 

Z\ “I - 2z 2 == 0.33. 


Table 10-3 



Weights 

waa 

wab 

wbb 

waM 

wbM 

z 1=0 

2 

2 

0 

0 

0 

0 

Z 2 = 0 

1 

0 

0 

1 

0 

0 

— z\ — Z 2 = 0.33° 

1 

1 

1 

1 

—0.33° 

—0.33° 

Summations 

3 

1 

2 

—0.33° 

—0.33° 
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Using determinants, we find that 


Zl 


-0.33 1 

-0.33 2 

3 1 

1 2 


-0.066°. 


Similarly, z 2 = —0.132°. The most probable values are then 

A = 42.31° - 0.066° = 42.244°, 

B = 75.89° - 0.132° = 75.758°. 


From the condition equation we find that C — 61.998°. The same result 
would have been obtained if the condition equation had been used to 
eliminate either A or B instead of C. Had the unknown being eliminated 
appeared in more than one observation equation, we would have been 
able to eliminate it from each by the use of the condition equation. 

The example used above is, of course, one of common occurrence, but 
it fails to point out a pitfall in the use of condition equations when the 
observations have unequal weights. The general nature of this danger is 
common to all work with observations of unequal weight. When using 
condition equations, if one observed values of, say, A, B, and C, each 
having different known weights, but the condition equation were of the 
form 

uA “I - bB -f- cC == Q , 

rather than 

A + B + C = Q, 


he must remember that it is the weights of A, B, and C which are known, 
rather than the weights of aA, bB, or cC. Thus, if the value of B is to be 
replaced by use of the condition equation, the latter must be used in the 
form 


l A + B + C b C = 


Q 

b 


10—7 NONLINEAR OBSERVATION EQUATIONS 

So far, in the applications of the method of least squares, we have treated 
only observation equations which are linear in the unknown quantities. 
It frequently happens, however, that one desires to use the method of 
least squares on observation equations in which the unknowns are non- 
linear. 

It is simplest to treat the case of two unknowns, from whose solution 
the extension to any number of unknowns is obvious. Assume that the 
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unknown quantities are Z x and Z 2) and the observation equations are: 


Equations 

Weight 

fi(Z,.Z 2 ) ^ Mi 

W\ 

fviZi.Z^) = M 2 

w 2 

fAZuZi) & M n 

W n 


(10-17) 


where the functions ■ ■ ■ ,fn are nonlinear in Z\ and Z 2 . 

It is almost never desirable to solve a problem of this nature by the method 
of least squares until one has obtained approximate values for the unknowns 
by other methods which usually require less calculation. Whenever possible, 
even with linear functions, the graphical methods of analysis treated in 
Chapter 4 should be used first because those methods are simpler and 
because they furnish an excellent method of testing the consistency of 
the observations. Frequently, glaring mistakes may have been made in 
the observations; they will show up in a graphical solution but will not 
be noticed in a solution by the method of least squares. Obviously, mistakes 
must be eliminated before reliable results can be obtained by the method 
of least squares. If no better method is available for obtaining approximate 
values of Z x and Z 2 in Eqs. (10-17), we can always solve a pair of observa- 
tion equations. 

Solution of nonlinear equations starts with approximate values for the 
unknowns. Let us assume that the approximate values A and B have been 
obtained for Z x and Z 2 respectively. Following the method of the previous 
section, we set 

Z 1 = A + z i, Z 2 = B + e 2 , (10-18) 

where z x and z 2 are considered as the new unknown quantities which are 
to be determined. By Taylor’s theorem, we can approximate the left- 
hand members of the observation equations as follows: 


fi(Zi,Z 2 ) =fi(A,B) + 
f 2 (Z x ,Z 2 ) =f 2 (A,B ) + 


df i 


dZ i 

df 2 


dZ i 


Z + 

A.B 1 dZ 2 

, , Ml 

A.B ' 9Z 2 


A,B 


*2, 


(10-19) 


A,B 


z 2 , etc., 


where the subscript A, B of the partial derivatives means that these 
partial derivatives are to be evaluated numerically at Z x = A and Z 2 — B. 
The quantities f x (A, B), f 2 (A,B), etc., are numerical values which 
should almost equal the observations M X ,M 2 , etc., respectively, if A 
and B have been carefully chosen and if the observations are consistent. 
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Let Mi — fi(A, B) — mi, 

M 2 — fz(A,B) = m 2 , etc., 

where the m’s are small numerical quantities which have the same residuals 
as the M’s if the approximations given in Eqs. (10-19) are sufficiently 
accurate for the purpose. We can then consider the m’s as the new obser- 
vations and obtain the observation equations: 


Equations Weight 

3/ 1 
dZ\ 

, , a/, 
a.b 1 + ez 2 

z 2 = mi w 1 

A,B 


3/2 
dZ 1 

2 l + 

A,B 

3/2 0 

22 = m 2 

oZi 2 A,B 

W 2 

(10-20) 

d'fn 

sz 1 

Zl + 

A,B 

dfn 

z 2 = m n 

0^2 A,B 

V) n 



which are linear in the unknown quantities z x and z 2 . By this device it is 
always possible to make a linear system of equations from any system 
of nonlinear observation equations. 

As an example, suppose an experiment has been performed in which a 
vessel containing an unknown liquid is placed in an air space which is 
surrounded with cracked ice, and the temperature of the liquid is observed 
to be 6 i} d 2 , 0 3 , etc., at times t x , t 2 , t 3) etc. Our purpose may be to deter- 
mine the specific heat of the unknown liquid, having previously deter- 
mined the heat capacity of the containing vessel. According to Newton’s 
Law of Cooling, 


Temperature 

Weight 

6 0 e~~ btl 6 X 

W 1 

e 0 e~ bt 2 ± e 2 

W 2 

6 0 e ~ bt 3 JL 6 3 

Ws 

6 0 e~ bt n ± d n 

W n 


are the observation equations, where in general the time ti can be measured 
much more accurately than the temperature. The solution of this problem 
follows. 

We obtain the approximate values for the unknowns $o and b by the 
graphical methods of Chapter 4. Suppose that these values are 6q — A 
and b = B. Then we define the most probable values of these constants 
as 

6q = A Z\, b — B + z 2 , 
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where z x and z 2 are the new unknown quantities to be determined. Using 
Taylor’s theorem, we find that 

01 = Ae~ Btl -f e~ Bt 'z x — At x e~ Btl z 2 , 

0 2 = Ae ~ Bt 2 + e~ Bt2 z x — At 2 e~ B % 2 , 

0 n = Ae~ Btn + e~ Btn z x — At n e~ Bt *z 2 . 

Let 

0! — Ae~ Btl = mi, 

0 2 — Ae~ Bt 2 = m 2 , etc. 

The observation equations then become 

Equations Weight 

e~ Btl z x — At x e~ Btl z 2 = m x , w x 

e~ Bt2 z x — At 2 e~ Bti z 2 = m 2 , etc. w 2 , etc. (10-21) 


This system of observation equations is now linear in the unknowns and 
may be solved by the general methods described previously. 

We note that, in the above example as well as in the more general case, 
the quantity is a zeroth-order approximation to the sum of the 

weighted squares of the residuals since the m,- are obtained by inserting 
values for the unknowns in the observation equations. Because of problems 
similar in nature to those raised in the discussion of Fig. 3-1 — large 
second or third derivatives of the theoretical function or, especially, the 
presence of maxima or minima for values of the unknowns near the trial 
values — it can turn out that J^Wivf after solution for the Z{ is greater than 
It is beyond the scope of this book to discuss all the contingencies 
in this event, but usually it means that a poor choice of trial values of 
the unknowns was made, and the problem should be solved with another 
choice. 

Conversely, even if J^Wivf < it does not follow that is a 

minimum. The entire procedure should be repeated with the initial trial 
values converted to new ones by the addition of the Zi derivatives calcu- 
lated for the new trial values, etc. 

In the example of the cooling curve, a somewhat simpler solution is 
possible, provided one is careful to determine new weights for the obser- 
vation equations. The logarithms of the observed temperatures may be 
considered as the observations, and the new observation equations written 
as 

In 0 O — bt x = In 6 X , 

In 0 O — bt 2 = In 0 2 , etc., 


( 10 - 22 ) 
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where the unknowns are In 0 O and b. The actual observations are, of 
course, the 0’s and not the logarithms of these quantities. The method 
employed must ensure that £ wv 2 equal a minimum, where v\ is the residual 
for 0j , v 2 the residual for 0 2 , etc. If the observation equations (10-21) 
are treated in the usual fashion, the result will be equivalent to making 
Y,wv 2 a minimum. However, treating the new observation equations 
(10-22) in this fashion is not equivalent to minimizing ]Titw 2 , but rather 
to minimizing J^wV 2 , where the F’s are the residuals in the logarithms 
of the 0’s. Thus V\ is the residual in In 0 lf V 2 the residual in In 0 2 , etc. 

Now, Vi is equal to the product of the small change in In 0 X for a small 
change in 0 X and the residual for 0 X ; i.e., 


d (In 0! ) Vi 

1 “ ~de^~ Vl - T,’ 

d (In 0 2 ) V2 

2 ~ 00 2 V2 ~ 0 2 ’ 


etc. 


Then ^wV 2 = J^(w/ d 2 )v 2 , which amounts to giving the original equations 
weights equal to Wi/df (i = 1 , ,ri). Obviously these weights are 
improper for the actual observations, but this discrepancy can be cor- 
rected by supplying the weights Wid 2 for the new observation equations. 
The final result is then obtained by treating the observation equations in 
In 0; as linear equations in the unknowns but with weights iu;0 2 . 

From the above example it should be obvious that any manipulation 
of the original observation equations will usually result in improper 
weighting. For example, if we clear a set of observation equations of 
fractions by multiplying by the denominators 2, 3, etc., we will find that 
this process is equivalent to weighting the observation equations by the 
squares of these denominators. Such a weighting can be corrected by 
using fractional weights, but we will then find that no advantage has been 
gained. The safest procedure is to use the original observation equations 
exactly as they occur, making sure that the quantity on the right-hand 
side of the = sign is the direct observation obtained from the experiment. 


10-8 COMPUTATION OF THE MEASURES OF SPREAD 

It was mentioned that the true errors Xi are not equal to the residuals Vi 
unless the number of observations n approaches infinity. In statistical 
discussions it is common to distinguish three standard deviations: <r, 
the standard deviation of the infinite parent distribution just defined; 
a', the best estimate of a that can be obtained from the given sample of 
readings; and S, the sample standard deviation defined by S 2 = (Y,v 2 )/n. 
Since <s cannot be found, we will assume here that our estimate of it, which 
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we shall describe below, is sufficiently close so that the difference can be 
ignored. 

If the expectation value //, of the distribution were known, then the 
error x could be found for each reading, and the best estimate of the 
standard deviation for the distribution which we can obtain from n 
observations would be a = [£ x 2 ]/n . 

In Fig. 10-1 suppose that the solid curve centered on n represents the 
parent distribution. Suppose, moreover, that the average, i.e., the most 
probable, value of n which is calculable from the n readings is M. Then 
the dashed curve of Fig. 10-1, centered on M, represents the best estimate 
of the solid curve which can be obtained from the n readings — with 
respect both to the location of its center and to its width. It corresponds 
to the crosses plotted in Fig. 9-1. 

Having found the best estimate of 
the center of symmetry of the dis- 
tribution, we must estimate the width 
of the solid curve, to be plotted as 
the width of the dashed curve, which 
includes 68.27% of the unit area 
under either curve. 

Let B be the difference M — /x. 

Then at the reading P if 

Xi = Vi + B, 

= v* + 2v{B -f- B 2 , 

J2 x 2 i = 2 v 2 i + nB2 (10-23) 

1=1 1=1 

since M was found by setting 


m M P t 




i= I 


FIGURE 10-1 


which is equivalent to Eq. (10-1). Equation (10-23) is to be expected 
since the method of least squares has made certain that £ v 2 < J^x 2 , 
where, of course, Vi and Xi are for the same readings. The value of B is 
almost completely unknown; however, it must be of the same order of 
magnitude as one of the measures of spread for a distribution of averages 
of n readings. All of the measures of spread have the same form and are 
related to each other by constants that are close to one. It seems reasonable 
to assume, therefore, that 

B 2 = Cal = C — - 

n n 1 
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where C has a value near one. Then 


I > 2 = £> 2 + c 2 >- 2 , 

n 

z4-9=i> 2 

and Eq. (10-23) becomes 

I> 2 = Zv 2 _ 
n n — C 


Obviously, the correction being considered is of no importance for 
large n. It is important, however, when n is rather small and we wish not 
to be too optimistic about the spread that may be obtained from this 
small number of readings. The worst possible case is where n = 1. Then it 
is obvious that no idea of the spread can be obtained, and the only value 
of C that is reasonable in this region is 1, since for one reading = 0 
and n — C — 0. Thus ^x 2 /n becomes indeterminate, as should be 
expected. Therefore, in this case of a single variable, we set C equal to 
one. 

Thus for a single variable the best estimate of the standard deviation 
in a single reading is 


a = 



(10-24) 


and the best estimate for the standard deviation for the mean is, from 
Eq. (10-15), 


o'o = 



(10-25) 


The quantity (n — 1) in the above equations is the number of degrees 
of freedom of the system. By degrees of freedom we mean the number of 
observations in excess of the minimum theoretically required to obtain 
the unknown quantity. For example, one observation is theoretically 
sufficient to determine the length of a table. If ten observations are made, 
the system of ten observation equations has nine degrees of freedom 
because there are nine more observations than are theoretically required 
to obtain the length of the table. From this we see that if n observations 
are made on q unknowns, the number of degrees of freedom is (n — q ), 
and the value of the standard deviation for observations of unit weight 
becomes 



When the observations are weighted, the standard deviation for the 
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observations of unit weight is 



and <r for an observation of weight w k is given by 


cr Wle — 


<7 

Vw k 




w k (n — q) 


(10-26) 


(10-27) 


[see Eq. (10-14)]. Finally, the standard deviation of the average of a set 
of observations of one unknown is 


<t _ I Ywv 2 
<T ° ~ “ \(n zr ~l)'Ew' 


(10-28) 


Where observation equations are available for more than one unknown, 
the computation of the standard deviation of the unknown quantities 
is more complicated; and we need additional discussion before we can 
calculate it. This problem will be treated in the next chapter. 


10-9 TREATMENT OF IDENTICAL READINGS 

A special case of considerable importance arises when a series of identical 
readings are obtained in the measurement of a single quantity. In this 
case, the average is the same as the individual observations. The distri- 
bution is clearly not normal; a slavish substitution into the foregoing 
equations gives <r 0 = 0, which is nonsense. This case usually comes about 
when too coarse a scale is employed for reading the setting of an instru- 
ment; the setting can be adjusted more accurately than it can be read. 
Obviously, if one measures a steel bar to the nearest inch, all observations 
will be identical, but if one measures the same bar to the nearest 0.0001 in. 
he will obtain a spread. 

Suppose that in measuring a steel bar to the nearest inch, ten readings 
of 77 in. are obtained. It is certain that the true length lies between 
76.5 in. and 77.5 in., but all values between these limits are equally 
probable. This means the average of the ten readings should be considered 
to come from a rectangular distribution extending from 76.5 in. to 77.5 in. 
Using the results from Section 9-1 for this case, we find that 

a = = 0.29 d, a 0 = = 0.092d, 

2\/3 V10 

where d is the width of the smallest interval observed. In this special 
case d = 1 in. The standard deviation of the result should be reported 
as 0.1 in. In no case should a standard deviation smaller than this be 
reported. 
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10-10 THE REJECTION OF OBSERVATIONS 

In the first chapter a distinction was made between an error and a mistake. 
It is obvious that if one of the readings in a group is known to be the 
result of a mistake, one would wish to reject it. The problem is that such 
mistakes usually are not identifiable, and so one comes to consider whether 
there are not methods connected with the distribution of the readings by 
which he can make an educated guess as to whether any of the readings 
seem not to belong with the majority of them. 

While the whole idea of discarding readings is rejected by many in- 
vestigators, it is argued here that the possibility of an undetected mistake 
is very real. In this context “mistake” means more than a mistake by 
the operator in reading a scale, for instance. It is not at all uncommon 
for sensitive electronic equipment to respond in a transitory way to some 
outside influence, such as the turning on of a large motor in a neighboring 
laboratory, or for a transistory air current to affect the swing of a balance. 
Events such as these introduce a reading which is not drawn from the 
same universe as the remaining readings. The cost of allowing one’s 
results to be distorted by a single bad reading is much higher than is that 
of mistakingly rejecting one good reading out of ten, say. 

In Chapter 12 we will consider methods which take into account much 
more precisely than those we have used so far the effects of the number of 
observations on such quantities as the probability that all the readings 
came from the same parent distribution. Nevertheless one can apply 
only the knowledge gained to this point to get results which are usefully 
accurate for samples even as small as five when the standard deviations 
of the parent distributions are as small as those with which the physical 
scientist is dealing most of the time. 

Two bases for rejection will be set forth. The first is easier to apply 
and, some would say, safer. The second is more enlightening and illus- 
trative; it is pedagogically more useful. 

The first method is to decide that if the chance of the occurrence of some 
particular reading is less than some arbitrarily fixed number, it will be 
assumed to be a mistake. As usual, we take the mean of the readings to 
be the same as the expectation value for the distribution and assign to 
a the estimate provided by Eq. (10-24). Then we see whether any residual 
lies outside the value of X given by Eq. (9-10) when Px is some arbitrarily 
chosen number. 

If it is decided that any residual having a chance of occurrence less 
than 1 in 1000, say, is to be rejected, then we let Px be ^(0.999) since 
(P(v < oo ) covers only half of the symmetric normal distribution. Reference 
to Table A shows that the area Px under the distribution reaches 0.4995 
when X/a is 3.29. [We recall that this a is not the standard deviation in 
the mean of the readings in question; it is the standard deviation in a 
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single reading — the measure of the spread in the distribution of the 
individual readings — as determined from Eq. (10-24).] Thus if we have a 
residual greater than 

I Vy2 

X = 3.29 ^ ^ TT » 

\ in — 1) 


we can reject the reading. 

Note that this criterion will never allow the rejection of one out of ten 
or fewer observations when the probability of occurrence of that observa- 
tion is less than 0.1%. For, if it did, it would mean that there would have 
to be a residual X such that 

X 2 > ( 3 . 29 ) 2 ^i£Lll!, 

where the coefficient of (3.29) 2 on the right-hand side is the sum of the 
squares of 10 residuals, including X 2 , divided by n — 1, or 10 — 1. It 
is clearly impossible to satisfy this inequality. 

Consider the example given in Table 10-4. We see that the reading 4.6 
can be rejected by this criterion, but it should be noted that though it 
looks badly out of place, there is not a tremendous difference between its 
residual and 3.29<r. Readings should never be rejected on the basis of 
“looks.” 

After rejection, the remaining data should again be subjected to 
averaging and the new <r calculated. Usually it will be found that only 
one reading can be rejected. 

In the second method, called Chauvenet’s criterion, we make the 
limiting probability of occurrence for “acceptable” readings depend on 
the number of readings in the following way. In a set of n readings the 
number of errors that should be less than some value X in absolute value 


Table 10-4 


n 

Vn 

Vn 

n 

Vn 

Vn 

i 

5.4 

0.09 

ii 

5.4 

0.09 

2 

5.3 

- 0.01 

12 

5.3 

— 0.01 

3 

5.5 

0.19 

13 

5.5 

0.19 

4 

4.6 

- 0.71 

14 

5.2 

— 0.11 

5 

5.2 

— 0.11 

15 

5.3 

- 0.01 

6 

5.3 

- 0.01 

16 

5.3 

- 0.01 

7 

5.2 

— 0.11 

17 

5.2 

— 0.11 

8 

5.5 

0.19 

18 

5.4 

0.09 

9 

5.6 

0.29 

19 

5.3 

— 0.01 

10 

5.3 

- 0.01 

20 

5.4 

0.09 

y = 

5.31 

<r = 0.2024 

3 . 29 a 

= 0.666 
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is 2nPx, where the factor 2 appears because Px gives half the total area 
under the error curve for — X < v < X. Then if 2 nP x is the number 
of errors less than X, the number of errors greater than X is (n — 2 nPx). 
If Px is large enough such that ( n — 2 nPx) is less than one half of a 
reading we can suppose that it is more probable than not that an error 
greater than X in magnitude does not belong in the same distribution as 
the rest of them. Thus the limit of rejection is given by 

i = n(l — 2 Px), 
or 

Px = ~~ ■ (10-29) 

As an example, consider the first ten values in Table 10-4. This set has 
Tj = 5.29 and a = 0.277. With n = 10, Eq. (10-29) gives Px = 0.475. 
According to Table A, Px reaches this value at X/a = 1.96, which, for 
this set of data, means at X = 0.54. Since the error at the fourth point 
is 0.69, we can reject the observation by this criterion. Recalculation with 
nine points yields z = 5.37, a = 0.141, Px = 0.472, X/<t = 1.91, 
X = 0.27. The largest error is 0.23 so that no further points can be 
rejected. 

We can now improve the estimates of the parameters of the distribution 
generated by the dart-dropping experiment described in Chapter 7 and 
discussed in Chapters 8 and 9. This is done by using Eq. (10-26), with 
q = 1 and w as the number of observations in the intervals, and applying 
the rejection criterion just described. We then obtain the result given in 
Table 10-5. Since the number of observations here is so large, the value 
of Px remains constant to four significant figures at 0.4995 throughout. 
The values in the last row of Table 10-5 will be used in examples in 
Chapter 12. 


Table 10—5 


2 > 


<J 

X 

Reject 

500 

—0.7820 

3.475 

11.36 

-17.5 

499 

—0.7485 

3.397 

11.11 

—13.5 

498 

—0.7229 

3.352 

10.96 

none 


It is obvious that these changes will also alter the shape of the normal 
distribution, which was fitted to the histogram for this experiment and 
shown in Fig. 9-1. The appearance of the curve remains much the same, 
however, and recomputation will not affect the conclusions drawn from 
the earlier figure. 
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As with the previous criterion, one must never reject more than one 
point at a time. It can happen that more than one observation exceeds a 
rejection limit, but the data must be recalculated after rejection of only 
the largest one. The omission of a single reading which is presumed to be 
erroneous might alter considerably one’s picture of the error distribution. 


10-11 THE RANGE: A CONVENIENT ESTIMATOR 

When working in the laboratory it is frequently desirable to know the 
approximate precision of a given set of observations. One would like to 
know quickly whether the desired degree of precision has been reached 
with the number of readings that have been taken, or whether more should 
be taken, without spending the time to make the elaborate calculations 
indicated by the foregoing treatment. One way of estimating this is to 
observe the difference between the largest and smallest readings. This 
difference, called the range, has a probability distribution, just as the 
individual readings do. The distribution of the range depends, of course, 
on the nature of the distribution of the readings, but it is different from the 
latter. In particular, the distribution of the range is not a normal distribu- 
tion for readings which are distributed normally. 

The distribution of the range for readings which are distributed normally 
will be discussed in Chapter 12, but it is instructive to consider an approxi- 
mation to the expectation value of the range by the methods that have 
been introduced already. This approximation is better for large numbers 
of readings, but it is quite good for as few as four readings. 

We assume that the largest reading is just as much greater than the 
mean of all the readings as the smallest is less than this mean; that is, 
we assume that the mean of the readings is the same as the mean of the 
largest and smallest, although we do not use the numerical value of this 
mean directly. Then if R is the range, the absolute value of the residuals 
in the largest and smallest readings, which we assume to be equal to the 
errors of those readings, is R/ 2. 

By arguments similar to those used in the previous section we conclude 
that since (n — 2) of n readings must certainly have residuals less than 

R/ 2 , 



where <P(|x| < R/2) is the probability of observing a residual less than 
R/2. 

Since there are no single readings with either X > R/2 or X < —R/2, 
we assume that (P(|x| > R/2) must be small enough such that no more 
than half of a reading would lie outside this limit; this corresponds to 
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5 10 15 20 FIGURE 10-2 

n 

the argument of the previous section where we were trying to decide 
whether a particular reading was part of a group. That is, we write 

i = 2n(?(\x\ > R/ 2), 

we use 

| - (P (M > ft/2) 

as a second estimate of (P(|a;| < R/ 2), and we set the average of the two 
estimates of the latter equal to Pr /2 . The result is 

P R I2 = (10-30) 

For n = 10, say, Pr /2 — 0.438, and reference to Table A shows that 
R/2<t = 1.535, or R/a = 3.07. 

Since the standard deviation in the mean of the 10 readings is given by 

o’ o = <r/V 10 , 

we have 

< Tq/R = 0.103. 

The handy reference curve shown in Fig. 10-2 was prepared in this way. 
The numerical results of the exact derivation given in Chapter 12 are not 
very different from these so long as n > 5. For n < 5, the probability 
of finding the range to be near its expectation value becomes so low that 
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its discussion is more an exercise in mathematics than in practicality. 
In practice, if 10 readings were to show a range of 0.8, say, then a quick 
estimate of the standard deviation of the mean of the ten readings would 
be 0.08. 


PROBLEMS 

1. For one unknown and equally weighted observations, show that 

I> 2 = £M? - (EMif/n, 

where M% is an individual observation. Note then that the average M and 
E V 1 can be computed by going through a single summation process on a 
desk calculator which can accumulate a factor and a product simul- 
taneously. 

2. Ten measurements of the specific gravity of a solution gave the results: 

1.0662 1.0664 1.0677 1.0663 1.0645 

1.0673 1.0659 1.0662 1.0680 1.0654 

Find the most probable specific gravity, the best estimate of the standard 
deviation for the distribution from which those were drawn, and the 
standard deviation in the most probable specific gravity. 

Answer: a = 1.05 X 10 -3 , sp. gr. = 1.06639 ± 3.3 X 10 -4 

3. For (y, x)-data that obey 

y = a -f- bx 

show that 

v 2 _ v 2 (I» 2 (I> 2 ) — 2(I»(2»(I>j/) + nfcxyf 

& “ *- V n(2>2) - (I» 2 

Note that this expression for E v2 > as we H as the one in Problem 1, involves 
finding a small difference between two large numbers. 

4. For ( y , x)-data that obey 

y = ax (1 + bx) 

show that the solutions for a and b are the same as when one treats 
y = ax + cx 2 and then sets b = c/a. 

5. For (y, z)-data that obey 



show that the solutions for a and b are the same as when one treats 
y = A + Bx and then sets a = l/B, b = A/B. 
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6. For ( y , x)-data that obey 

y = ax + bx 2 

it is convenient for plotting to define Y = (y/x). In order to treat 

Y = a + bx, 

show what weight must be given each Y in order to get the correct results 
for a and b. 

7. For (y, x)-data that obey 

y = a + bx + cx 2 

find expressions for the constants that are convenient when there are an 
odd number n of data pairs and the values of x have an equal spacing d. 
(See Appendix 3. For further discussion of the fitting of polynomials, see 
reference 10.) 

Answer: For instance, 

2 = 15[L(12i 2 - n 2 - l) yi ] 
n(n* - l)(n 2 - 4) 

8. Measurements of the ordinates of points on a straight line corresponding to 
exactly known abscissas 4, 6, 8, 9 are made with results 5, 8, 10, 12. What 
is the most probable equation of the line and what is the best estimate of 
the standard deviation in an observation of unit weight? 

Answer: y = — 0.288 + 1.339x, a = 0.39 

9. Solve the following equations for the most probable values of x and y: 

x d - y = 10.0 i 0.36, 

4 x — y ± 19.0 ± 0.51, 

2x+Sy =?= 25.0 dh 0.51. 

(See Eq. (10-14). It is usually most convenient to give unit weight to the 
observation with the greatest error.) 

Answer: x = 5.839, y = 4.387 

10. Four observations of the angle A of a triangle gave a mean of 36° 25' 47", 
two observations of B gave a mean of 90° 36' 28", and three on C gave 
52° 57' 57". Adjust the triangle. 

Answer: A = 36° 25' 44.23", B = 90° 36' 22.46", C = 52° 57' 53.31" 

11. The unknowns x, y, z are subject to the condition 

x ~j- 2t/ -f- 3z — 36. 

Observations are made with the weights as noted: x = 4.3, wt. 1; y = 5.7, 
wt. 4, z = 7.3, wt. 9. What are the adjusted values? 

Answer: x = 3.768, y = 5.433, z — 7.122 
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12. A mass analysis is run of a sample of magnesium, and the following ratios 
of the concentrations of the three isotopes are observed: 


Mg 24 

Mg 25 


= 7.9, 


Mg 24 

Mg 26 


7.0, 


Mg 25 

Mg 2 ^ 


0.9. 


What is the indicated composition of the sample? 

Answer: Mg 24 78.7%, Mg 25 9.9%, Mg 26 11.4% 

13. Oxygen will dissolve in solid uranium dioxide to give a material whose 
composition is described as UO 2 + 1 . A theory exists by which x can be 
related to the existing partial pressure p of oxygen. At 1100°C the relation 
is expected to be 

logp = 2 log — 1.770a; — 5.155 

a — x 


for p expressed in atm. Because of the method used to measure these low 
pressures, it is proper to consider that p is measured directly. With the 
following data, supposing that x is known exactly, estimate a starting value 
of a and then determine a least-squares correction to that value. 

x p (X10 9 ) 

0.03 5.5 

0.04 9.8 

0.05 15.9 

0.06 21.7 

Answer: a = 1.0111 

14. The capacity of a condenser is known to be 14.0 mf. It is divided into 5 
sections, a, b, c, d, e, and it is known that the difference between b and d 
is 1.5 mf. Weighted observations on the individual sections are: 



Capacity (mf) 

wt. 

a 

2.02 

3 

b 

4.13 

2 

c 

2.52 

5 

d 

2.67 

7 

e 

2.84 

4 


(a) Find the most probable capacities of the sections. 

(b) Recognizing that the existence of condition equations increases the num- 
ber of degrees of freedom, find the best estimate of the standard deviation 
of an observation of unit weight. 

Answer: (a) a = 2.008, b = 4.125, c = 2.487, d = 2.625, e ~ 2.755 
(b) a = 0.156 

15. Readings taken on the successive nodes in a Kundt’s tube experiment 
were 10.1, 15.1, 24.9, 34.9, 45.0 cm. Find the most probable value of the 
half-wavelength of the sound in the tube. 

Answer: A/2 = 8.96 cm 
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16. Successive measurements of a particular quantity gave the results: 19.1, 
17.0, 17.9, 20.5, 18.4, 22.0, 15.8, 16.5, 13.7, 14.9. It is desired that the 
standard deviation in the mean be less than unity. 

(a) With the aid of Fig. 10-2 decide whether any more readings need be 
taken. 

(b) Find the standard deviation in the mean and compare with the estimate 
made in part (a). 

(c) The result 22.0 seems somewhat large. Use Chauvenet’s criterion to 
see if it should be rejected. 

Answer: (a) <jq ~ 0.85 (b) <ro = 0.80 (c) residual at 22.0 = 4.42, 

1.96(r = 4.98; therefore 22.0 must be kept. 

17. Observations are made of the expansion of amyl alcohol with change in 
temperature as follows: 


V (cm 3 ) 

1.04 

1.12 

1.19 

1.24 

1.27 

*(°C) 

13.9 

43.0 

67.8 

89.0 

99.2 


If V = 1 + Bt + Ct 2 expresses the law relating the volume and tem- 
perature, find the most probable values of B and C. 

Answer: B = 2.897 X 10~ 3 cm 3 /°C, C = 1.904 X 10“ 6 cm 3 /(°C) 2 

18. ( Warning : Parts (6) and (c) of the following problem require very extensive 

computation.) Given the following data on the index of refraction of fused 
quartz as a function of wavelength: 


X (10 5 cm) n 


1.936 1.560 

2.313 1.519 

2.749 1.496 

3.404 1.479 

4.340 1 .467 

5.086 1.462 


(a) Find a method of plotting this data that would yield a straight line if 
it follows Sellmeier’s formula, 

2 1 “b •4X 

n = X2 - B ’ 

and estimate A and B. 

(b) Supposing that the observations of n are equally weighted, use the 
method of Section 10-7 to find least-squares corrections to the values of A 
and B found in part (a). 

(c) Supposing the observations of n are equally weighted, use the form of 
equation found in (a) in order to compute A and B directly without the 
method of Section 10-7. 

Answer: A = 1.0971, B = 0.8750 X 10- lo cm 2 
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(Note. The exact results will depend on the number of figures carried 
and on round-off errors, which will vary according to decisions made. The 
authors carried four decimal places in every column. The results given 
above were for part (c). With one pass in part (b), i.e., without a recalcula- 
tion of new corrections to those found the first time through, the authors 
found A = 1.0972, B = 0.8747 X lO" 10 cm 2 .) 

19. Two criteria for the rejection of observations are given in the text, one 
based on a fixed probability, and Chauvenet’s criterion. At what numbers 
of observations are the two equivalent when the probability used in the 
first is the 0.1% used in the text, 1% and 5%? 

Answer: 

P n 


0.001 500 

0.01 50 

0.05 10 
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PROPAGATION OF ERRORS 


In the last chapter we discussed methods of determining those values of 
the desired unknown quantities which are rendered most probable by the 
set of data in hand. The methods are general enough so that they are 
applicable to arbitrary numbers of unknown quantities related to one an- 
other via observation equations that are linear or can be made so. On 
the other hand, the degree of exactitude to which an unknown has been 
determined was only discussed for one unknown. It is the principal business 
of this chapter to rectify this situation. To do so we must begin by dis- 
cussing the general subject of the propagation of errors. The purpose of 
treating this latter subject is to answer the question, “Given some set of 
numbers and their errors, what is the error in some prescribed function 
involving these numbers?” 

As a very simple example, consider the problem of determining the 
volume of a right circular cylinder from measurements of its diameter d 
and height h. The volume is given by V = ^ird 2 h. If we could specify 
the maximum possible error Ad in d and the maximum possible error Ah in 
h, then we could easily calculate the maximum possible error AV in the 
volume. Using the methods of Chapter 3, we would find the maximum 
error in the volume to be 

AV = %Ad + d Z;Ah = J dhAd + ^ d 2 Ah. (11-1) 

dd dh 2 4 

However, as we saw in Chapter 7, it is usually not possible to specify a 
maximum error, i.e., an error which is guaranteed never to be exceeded. 

The standard deviation, on the other hand, can be estimated with 

reasonable ease. As a measure of the spread, it represents an error that 

has a 32% chance of being exceeded; this can be seen by referring to 
Table A at the end of the book. Since a maximum error cannot be found, 
or even defined, we must calculate the standard deviation cry in the 
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volume in terms of the standard deviations dh and dd in the height and 
diameter respectively. To find the relationship between dv, <Th and dd, 
we start with the fundamental definition of the standard deviation d 
of the individual readings from an infinite parent distribution: 



Usually h and d are each obtained from the average of a series of readings 
so that our final interest centers on the standard deviations of the averages. 
Furthermore, it is necessary to recognize that the “true” values of h or d 
are as likely to be less than as to be greater than the averages used for the 
values of h and d. Hence it is not certain whether the effect of each error 
is to increase or decrease the error in the volume ; this, too, is a statistical 
matter. 

The above example was introduced as an illustration of the problem; 
rather than proceeding to its specific solution we shall consider the general 
problem and calculate relationships that can be used for all special prob- 
lems. 


11-1 GENERAL PROBLEM 


In the general problem of the propagation of errors we shall assume that 
G is to be calculated from measurements, designated as M b M 2 , . . . , M r . 
It will be convenient if the reader accepts as the meaning of the symbol 
Mi some one of the variables of which G is a function and also, at the 
appropriate points, a particular value of that variable. If G is a function 
f(Mi, M 2 , , M r ), we mean by df/dM{, the partial derivative with 
respect to the variable Mi and that this derivative is then to be evaluated 
at some particular set of values of all the variables including Mi. We are 
concerned in this book with the handling of numbers; the derivatives 
then are useful only after they have been evaluated. 

Let us assume that a single measurement has been made of each of the 
M/s and that the true errors are X\, x 2 , . . . , x r in Mi, M 2 , . . . , M r 
respectively. The true error X in G is then given approximately by 


X = 


x i + -Mr x 2 + 


dMi 


dMo 


+ JL X 

' dMr T> 


( 11 - 2 ) 


provided that the x’s are not too large. If another set of measurements are 
made of the Mi’s with true errors x[, x 2 , . . . , x' n respectively, the corre- 
sponding error X ’ in G will be 


X' = 


^ x 'i + -Mr x 2 + 


dMi 


dM, 


^ dM r ^ 
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Suppose that this process is continued until n observations have been 
made on each of the Mi’s. The standard deviation for the individual 
values of G is then given by 

,2 V £X 2 

g'g — lim — — > 

71 — >00 W 


where the prime on the <t'g indicates that it is an estimate of the standard 
deviation of a single reading of G. There are n equations similar to 
Eq. (11-2). The squares of these equations are 


X 2 = 


(M Xl ) +(; 


a/ y 
sSJ V + 


, 0 a/ a/ 

+ 2 m 


+ 2 JLJL X , X ,+ 

+ 2 m i aM 3 13 + 


X' 2 = 


(ml 1 ' 1 ) + G 


_l o d/ — x\xo 4- 2 — — x\x\ 4- 

+ 2 dM i dM 2 XlX2 + 2 dM x dM 3 13 + 


If we add these equations and divide by n, we obtain 


df df /T,XiX 2 S 


| r> I 1 t o J J 

^ dM x dM 2 V n / "*“ dMi ailf 3 


a/ a/ (Ywcs 


(11-3) 


Each of the terms that are to be added in the evaluation of , ^x 2 , etc., 
being a square, is positive. Thus the quantities d2x 2 )/n, (^x 2 /n), etc., 
rapidly approach constant values as n approaches oo. The terms which 
are to be added to calculate Y,xix 2 , etc., however, are just as likely 

to be positive as negative. Hence the quantities (22xiX 2 )/n, (Xl x i x s)/ n 
etc., rapidly approach zero as n approaches oo. Thus in the limit as n 
approaches infinity, Eq. (11-3) becomes 


< t'g = 


(Mi a ') + 6 


^2 J + 


(JL, V 

\»M r 7 • 


(11-4) 


Equation (11-4) gives the relationship between the standard deviations 
of the infinite parent distributions, but we are most interested in un- 
certainties of the final results, the averages of a finite number n of the 
readings. For n readings on each of the Mj s 



_/2 

2 & 1 

2 

02 2 

n 

a x = — 1 
n 

V 2 = 

n 
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where the unprimed values of a are for the averaged G, M 1} M 2 , etc. 
Therefore 



Equation (11-5) is the expression generally used for propagation of errors 
and is one of the most important equations in the whole subject of physical 
measurements. Although it is derived here on the assumption that each 
quantity is measured the same number of times, the result is independent 
of this assumption. 

Equation (11-5) may now be applied to the problem of the volume of a 
right circular cylinder with which this chapter was introduced. We see 
that the result will be similar to Eq. (11-1) if AF, Ad, and Ah are replaced 
by <jy, <Td, and <Jh respectively, and each of the terms squared; i.e., 


2 ( 7T dh V / 7T d 2 Y n 1 

°’ f = V2 - v + V _ T cr V ' ( 11_6 ) 

If for a cylinder of the given size we substitute for Ad and Ah in Eq. (11-1) 
the values of <Td and <r* respectively, we will obtain a value for AF which 
will necessarily be greater than the value of ay. This fact can be illustrated 
graphically as in Fig. 11-1. The hypotenuse <r v of the right triangle is 
less than the sum AF of the lengths of the sides. 



People not trained in statistical procedures often use equations like 
(11-1) instead of (11-6) when it is the latter that they should use. An 
equation like Eq. (11-1) always gives a more pessimistic statement of the 
propagated error than does Eq. (11-6). Although it is better to be too 
pessimistic than to be too optimistic in reporting the magnitude of the 
errors in one’s experimental results, it is a disservice to one’s colleagues to 
report a much larger error than is warranted. The scientist must build 
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on the works of others and use the building blocks developed by others. 
Blocks that are thought to be badly misshapen are as useless as ones that 
are truly misshapen. Moreover, AV calculated from Eq. (11-1) cannot 
be described as a standard deviation. 

The reason for the difference between these two types of equations is 
obvious when one looks at the derivation of the second equation, where 
the cross product terms in Eq. (11-3) were dropped because they were as 
likely to be positive as negative. Thus equations like Eq. (11-6) take 
advantage of the fact that there is a high probability that the errors in 
the diameter and height will tend to cancel each other. 

11-2 SPECIAL CASES 

While Eq. (11-5) has a general applicability to all problems of this type, 
a number of special cases occur so frequently that it is worth while 
remembering special formulas for them. 

Two observations Mi and M 2 are required in determining a length 
with a meter stick, and the length is L = M\ — M 2 . From the general 
equation (11-5) we see that al = a 2 + <r 2 , and since the two observations 
are of the same kind, <r\ = <r 2 so that 

<r L = <rV 2. (11-7) 

Other special cases which arise frequently are: G = for which 

<rl = !>?, (H-8) 

and G = i for which 

<r% = Eaffff. (11-9) 

It should be noted that this last relation applies to weighted averages 
in which the average is given by 

. w x Ri + W 2 R 2 + • * • 

-I r — ' 1 

2 _> 

for which 

** = (i>0 + Gy) + ' ' ' 

From Eq. (10-14), we find that <rf — <r 2 /w\ } = <t 2 /w 2 , etc., where 

a 2 is the standard deviation of the observation of unit weight. Hence 
<j\ — <t 2 /^w. This last relation could be obtained directly from 
Eq. (10-14), since the weight of the average A is 

One of the most important special cases is that in which 

G = kM a iM\Ml • • • 


(k = const). 


( 11 - 10 ) 
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This type of equation occurs very frequently, and the resulting special 
formula is so simple that it can be remembered and used for mental calcula- 
tions. If one applies the general equation (11-5) to the case of three 
variable factors, for instance, he gets 

ao = (aM'c'MlMlatf + ( bM\M\- 1 M e z <r 2 ) 2 + . 

If the left-hand side of this relation is divided by the square of the left- 
hand side of Eq. (11-10) and the right-hand side by the square of the 
right-hand side of Eq. (11-10), the result is 

(cs) = (iR) + (S) + fe) ' (11_u) 

The quantities <tG/G, <Ti/M l} etc., are often called the fractional errors 
in G, Mi, etc. Since the a ’ s are proportional to all other measures of 
spread, it is obvious that any of the equations that have been developed 
here can be used by merely replacing the <r’s by any other measure of 
spread such as p, a.d., etc. 

The relation given by Eq. (11-11) is most frequently used for calculating 
percentage of error in problems where the relation between the measured 
quantities is given by Eq. (11-10). Consider the problem of the volume 
of a right circular cylinder given in the introduction to this chapter. It 
can be seen that if the spread in the errors amounts to one percent for 
both the diameter and the height measurements, then the percentage of 
error in the volume is 

Vl 2 + 22 = 2.2%, 

an operation which can be performed mentally. When propagation-of- 
error problems involve equations like ( 11 - 10 ), by far the simplest method 
of calculating is given by Eq. (11-11). 

At this point the reader should be warned against a very common 
mistake; i.e., not making sure that the various observed quantities 
Mi, M 2 , etc., are the directly measured quantities and are not dependent 
on one another. An excellent example of this mistake is found in a com- 
mon college physics experiment in which the index of refraction of a slab 
of glass is obtained by means of a microscope with a scale indicating the 
vertical position of the microscope tube. In carrying out the experiment 
one places the slab on the microscope stage and observes the positions 
Mi and M 2 of the microscope when it is focused on the top surface and 
the bottom surface of the slab, respectively. A third observation M 3 is 
obtained by observing the position of the microscope tube when the slab 
of glass is removed and the microscope is focused on the stage. The index 
of refraction is given by p = ( M x — M 3 ) / (M x — M 2 ). The solution for 
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/i obviously contains three terms, and the term containing a i will be found 
to be quite small because errors in M\ have a relatively small effect on the 
index of refraction. This is true because if M i is large, the numerator and 
denominator are both large, and the change in the value of n is small. 
New students often write this equation as n = t/d and apply the simple 
relation of Eq. (11-11). They fail to recognize that since t = Mi — M 3 
and d = Mi — M 2 , t and d are not independent and greater accuracy 
is available than they have obtained. 

Because Eq. (11-11) is so simple to apply, it should be used wherever 
it is applicable. Where it is not applicable, the general Eq. (11-5) should 
be used. 


11-3 STANDARD DEVIATIONS OF UNKNOWNS 

CALCULATED FROM OBSERVATION EQUATIONS 


The general method of calculating standard deviations given by 
Eq. (11-5) can be applied to determine the standard deviations for the 
unknown quantities, which we calculated by the method of least squares 
in Chapter 10. For the most general case consider the normal equations 
for weighted observations given by Eqs. (10—16). The solution for the 
typical unknown, B for instance, is 


where 



[waa] 

[waM] • • 

• [waq] 

[wab] 

[wbM] 

[wbq] 

[waq] 

[wqM] • • 

• [mq] 


A 


[waa] 

[wab] • • • 

[waq] 

[wab] 

• 

[wbb] 

[wbq] 

• 

» 

[waq] 

[wbq] • • • 

[wqq] 


( 11 - 12 ) 


(11-13) 


If we expand the numerator of Eq. (11-12) in terms of the minors of the 
second column, we can write B as 


B = /3i [waM] + & 2 [wbM] + • • • + fi q [wqM], (11-14) 


where, for instance, 

[wab] [wbc] • • • [ wbq ] 

[' wac ] [wcc] [vocq] 

• • 

• • 

• • 

[ wag ] [wcg] • • • [wqq] 


Pi = - 


A 


(11-15) 
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We see then that the jS’s are independent of the observations, all the latter 
of which are contained in the sums [waM], [ wbM ], etc. 

Obviously, Eq. (11-14) can be written in terms of the M's since they 
enter into that equation in the form of summations. That is, 

B = nMx + t 2 M 2 + • • • + r n M n , (11-16) 

where the coefficient Ti of Mi is the sum of the first terms of each of the 
summations. Therefore 


T i = PiWidi + /3 2 Wi&i + &3WiCi + • • • + 0qW 1 Q 1 . 
Similarly, the coefficient of M 2 is the sum of the second terms: 


T 2 = 01 I0 2 U 2 + /S 2 M? 2 6 2 + 03 W 2 C 2 "b • * ’ H - PqU>2<l2' 


Thus we obtain the solution for B in terms of the observations Mi, M 2 , 
etc., so that we can apply the general equation (11-5) for propagation of 
error and get 

<T% = (Ti<Ti) 2 + (t 2 <7 2 ) 2 + • • • + (T n 0») 2 . (11~17) 


From Eq. (10-14), however, we find <r\ — <r 2 /w 1} etc., where as usual a 
is the standard deviation of an observation of unit weight. Thus we can 
write Eq. (11-17) as 


2 2 

<t b = a 


C 


„.2 2 
Ii + £i + 
W 1 w 2 


) TT 

W 


(11-18) 


It will be shown in Appendix 4 that 



where /3 2 is the coefficient of [wbM] in Eq. (11-14). Since <r| = <t 2 /wb, 
obviously wb = 1/02- This means that the simplest way to calculate the 
weight of B is to obtain a numerical value for (3 2 by merely keeping the 
summations [waM], [wbM], etc., in algebraic form, during the process of 
solving for B, long enough to obtain numerical values for their coefficients. 
This generally can be done without any extra labor at the time the original 
normal equations are solved for B. To obtain the weights of A, C, etc., 
the normal equations should be solved in the usual manner, keeping the 
summations in algebraic form. The results will be 


A = a\[waM] + a 2 [wbM] + azIwcM ] + • • • , 

B = 0i [waM] + 0 2 [wbM] + 0%[wcM] + • • • , 

C = 7 i[waM] + 7 2 [tt>6M] + 7 3 [it;cM] + • • • , etc. 


(11-19) 
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The important coefficients are on, 0 2 , and 73, which are underscored in 
Eqs. (11-19). It is shown in Appendix 4 that 

1 1 , 1 

w A = — > W B = -x- > and w c = — • 
a 1 02 7 3 

In addition to the determination of the errors in the constants A, B, 
etc., it is often desirable to know the error in the value of the function 
itself at some particular set of values of the independent experimental 
variables. This problem has been discussed by Birge [4] ; as he states, the 
error in the constant term of the equation for a straight line is the error 
in the function when the independent variable is zero. The extension to 
the case of a power series is easy. In any other case, the general principle 
of Birge remains valid. We shall reproduce Birge’s argument here. 

If we wish to have the error at some value of the independent variable x, 
e say, we need only make what is called a linear transformation; that is, 
shift the origin of x. Let x' = x — e. Then the desired error is the error 
in the new constant in the equation. 

If we write out the solution by determinants for the constants and 
compared it with Eq. (11-19), we see that 

oti = A~ l ^WiX 2 . 

Hence the reciprocal of the weight of the new constant is 

«i = A~ 1 '£w i (x i — e) 2 , 

in which, as the reader should show, the value of A does not change. 
Hence 

tr 2 (e) = <r 2 A — e) 2 . 

11-4 AN EXAMPLE 

Before going further into more complex situations, it would be well to tie 
down some of the above discussion with an example. For this purpose, 
we shall continue the example of Table 10-2. 

We assume that every observation yi is a single reading, and all readings 
have the same weight. All the calculations are made with Eqs. (10-16) 
for q — 2 and unit Wi, and using in Eqs. (11-19) 

on = A^J^WiX 2 and 0 2 = A _1 £w;. 

We then obtain the result 

<j = 3 . 88 , 


cta = 3 . 91 , 


(Tb — 4 . 57 , 
<r. ( 1 ) = 2 . 63 , 
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where the last standard deviation, as is indicated, is that of the value 
of z, 

z = -2.49 + 68.08x, 

when x is unity. 

11-5 INTERNAL AND EXTERNAL CONSISTENCY 
METHODS OF CALCULATING STANDARD 
DEVIATIONS FOR ONE UNKNOWN 

It frequently happens in the determination of an unknown quantity that 
there is a series of values each of whose errors represents the average of a 
set of single determinations. This situation could arise either as a result 
of work by one individual or a group of individuals, or as a result of 
attempts to combine the results of several distinct groups of workers. 
In the former case almost all of the original observations are available, 
whereas in the latter case only those averages and errors that have been 
given are available. 

If all the individual readings used to find each of the averages are known 
and if they are all of the same accuracy, then they should be combined 
and recalculated to give a single average; for all the readings came from 
the same universe, i.e., they all fit the same error curve, and the total 
number of readings allows a more precise estimation not only of the ex- 
pectation value of this curve but also of its precision index via the deter- 
mination of the associated standard deviation in a single reading. 

If the numbers of readings are not known, or if the readings were taken 
with different instruments which have different error curves associated 
with them, then we must assume that the averages are the result of 
readings drawn from different universes, which may have different precision 
indices and, unfortunately, different expectation values because of sys- 
tematic errors. Each average constitutes a single point drawn from what 
may be yet another universe, whose properties must be investigated. 
Here, the precision with which one can make this investigation is not as 
great, since there are available only the number of points equal to the 
available number of averages. Nevertheless, within limits, by considering 
the problem in the proper way, one can decide whether the various 
averages appear to differ from one another only as much as we would 
expect on the basis of the random errors involved in the determination 
of one of them, or whether the differences seem to be greater than this. 
In the latter case, we would suspect the existence of systematic errors. 
Such errors may be due to poorly zero-adjusted meters, instrument cali- 
brations different from the scale markings, or something much more 
subtle. 

An excellent introduction to this problem has been given by Birge [4]; 
the problem has also been discussed by Topping [11]. If we are given n 
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readings on a particular quantity, the best estimate of the standard 
deviation in one of these readings, i.e., the best estimate of the standard 
deviation associated with the universe of readings from which these n 
were drawn, is 


a = 



(11-20) 


in which £ v 2 is the sum of the squares of the differences (residuals) 
between the readings and their mean. 

The mean itself was determined by the use of n observations. Thus its 
weight is n compared with unit weight for one of the observations, and, by 
Eq. (10-14), its standard deviation is 


(T 0 = 



( 11 - 21 ) 


As Birge points out, Eq. (11-21) is a prediction of the standard deviation 
that would be found if enough observations of the mean were made so that 
the error of the mean could be found from an equation like (11-20), and 
if the same error curve held for each such observation of the mean. Thus 
suppose that N groups of n readings each are taken so that all the means 
have the same weight, and suppose that N means are found. Then the 
calculation of <r 0 from 


(To — 



(11-22) 


where £F 2 is the sum of the squares of the differences between the 
individual means and their mean, should be expected to nearly reproduce 
the result found from Eq. (11-21.)* Such will be the result if all Nn 
readings are truly taken from the same universe, that is, they are taken 
with the same instruments by the same people under the same conditions.! 
There is complete consistency of the observations under these conditions. 

If, on the other hand, the various groups of observations are obtained 
by different people under varying circumstances and with different instru- 
ments, then such agreement is not to be expected. It is clear then that 
there is consistency only within individual groups. When one makes the 


* As is discussed by Birge [4], there is an error associated with the determination 
of the standard deviation as well as the mean. The relative standard deviation 
in the standard deviation is l/V2n. If a is the standard deviation in a single 
reading of which n have been taken, <r/(nV 2) is the standard deviation in the 
standard deviation of the mean. See Chapter 12. 

t Actually, this is a poor way of obtaining so many readings, for if there is some 
constant error in an instrument, such a procedure will never reveal it. It is much 
preferable to take many averages from many sources. 
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assumption of internal consistency, he assumes that the precision index 
for all the groups is the same and estimates this by an appropriate averaging 
process applied only to the precision indices, and not to the location of the 
corresponding distribution peaks. 

On the other hand, if one assumes that each group is drawn, effectively, 
from the same universe, he is assuming the existence of external con- 
sistency between the means of the separate groups. When he properly 
applies an equation like (11-22), in which the individual group means are 
used, he will get agreement with his previous result if this latter assumption 
is correct. 

The availability of many averages from many sources has its maximum 
usefulness when each of the averages is accompanied by its standard 
deviation and the number of readings involved. Failing this requirement, 
one can proceed, as indicated earlier, with the aid of the errors alone to 
judge whether all the averages came from the same universe.* 

When the reported averages are accompanied by standard deviations 
or other measures of spread, we can determine the relative weights by 
Eq. (10-14), 



We will frequently find it convenient to assign unit weight to the reading 
with the largest error, though this is not necessary. The standard deviation 
of the final result is also calculated from Eq. (10-14). Since the weight of 
the final result is equal to the sum of the separate weights of the individual 
averages, 

(11-23) 


0o = 




where o is the standard deviation of the particular average that has been 
assigned the unit weight. This constitutes the averaging technique men- 
tioned earlier with which one uses information from each group to estimate 
the spread inherent in any one of them, which is assumed to be the same 
for all. Equation (11-23) then gives the standard deviation calculated on 
the assumption of internal consistency. That is, the o’s and w’s are all 
determined from the spread of the individual sets of observations. The 
a o of Eq. (11-23) is a function of the o’s of the separate averages; the o’ s 
depend only on the consistency of the readings making up those averages. 
The Oq does not depend at all on how well these averages agree with one 
another. 


* With only the standard deviations and not the numbers of readings available 
one cannot decide whether the possibly different universes have different widths, 
i.e., whether a different standard deviation of single readings is to be associated 
with each of them. 
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As mentioned earlier, it often happens that these averages do not agree 
with one another nearly as well as we might expect from their individual 
tr’s; either the quantities measured are not identical or there are sys- 
tematic errors in one or more of these averages. The best way to make an 
organized examination of the extent of an apparent disagreement is to 
calculate the standard deviation by the method of external consistency 
and compare it with the result of Eq. (11-23). The equation to be used is 
(10-28) , which is more general than Eq. (1 1-22) . In Eq. (10-28) the weights 
w are found as for the internal-consistency calculation, and n is the number 
of groups. The residuals are between the weighted mean and the group 
means. The calculation of weights in the same way as before corresponds 
to the assumption, made here also, that all the universes from which the 
different groups are drawn have the same precision index. 

Birge [4] discusses numerical criteria on the basis of which one can 
decide whether an observed departure from equality of the results of 
Eq. (10-28) and Eq. (11-23) is or is not greater than one should expect 
by chance. Here, we shall defer the discussion of such criteria until the 
next chapter. For our present purpose, it will suffice to say that if there 
is a very large difference, we must consider the groups to be inconsistent. 

An example is to be found in a compilation of thermodynamic data 
on the alkali metals.* In Table 8 of this paper several values of the heat 
of vaporization of potassium were given, and some of these were starred. 
The starred values were those given the greatest weight by the authors 
of the compilation. With the exception of one which had no probable 
error attached, they will be used here as an illustration. 

The errors given in the paper are probable errors in the listed means 
of the results by individual investigators. Here, in Table 11-1, they 
have been converted to standard deviations by the use of Eq. (9-7). The 
assigned weights are inversely proportional to the squares of the errors. 
The reading of unit weight has <r 0 = 0.0163 kcal/mol. It is apparent 
that there are systematic differences between the different observations 
which are much greater than the precision of any one of them. This can 
be seen by examining the residuals in the last column and by comparing 
the standard deviations as calculated by the two methods. The one 
calculated by external consistency is ten times as great as the one calcu- 
lated by internal consistency. 

In a case like this, where it is quite clear that the individual averages 
differ by more than one should expect, it is probably best to recompute 
the values of the mean and its standard deviation by treating each number 
as a single observation of unit weight. An error which creeps in through 
some unrealized effect or miscalibration, for instance, cannot be corrected 


* W. H. Evans, et ah, J. Research Natl. Bur. Standards, 55 , 83 (1955). 
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Table 11-1 


HEAT OF VAPORIZATION OF POTASSIUM 


AHo, 

kcal/mol 

<70 

Weight w 

|AH5 - AfT5|, 

kcal/mol 

21.745 

I 

13 

1 

21.709 

■ 

118 


21.663 

1 

1 


21.762 


30 



(2>AffS)/(2>) = 21.7214 = A H° 0 


0.0163A/I> = O-OOU 


V2>(A#o - A£Q 2 /3l> = 0.013 


merely by taking more readings. In the present example, theoretical 
considerations* have lead to the belief that it is the first entry in Table 11-1 
which is correct rather than either of the two with larger weights; such 
additional information is not always available, however. 

A very famous case of a similar nature was a conflict between the value 
of the electric charge on the electron as observed directly by measuring 
the rate of movement of a charged drop of oil in air between two charged 
plates and indirectly by the accurate measurement of the distance between 
the atoms in a crystal by the use of x-rays. The resolution of the con- 
troversy is described by Birge.f The source of the discrepancy turned out 
to be a systematic error in the direct observation; an erroneous value had 
been used for the viscosity of air. An instructive quotation can be taken 
from an article by DuMond and CohenJ on the least-squares evaluation 
of various physical constants, “Before the oil-drop value of e had been 
corrected by revisions regarding the true viscosity of air, the [x-ray method] 
was questioned since it led to a value of e . . . then supposed to be too 
high. This was a fortunate thing since it led to a great deal of very careful 
and critical examination of all possible sources of error, both theoretical 
and experimental, in the above (x-ray) method ...” 

Intermediate cases involve difficult decisions. When there are only two 
averages, a fairly good criterion is to consider the averages inconsistent 
if they differ by more than twice the sum of the separate <r’s, and consistent 
if they differ by less than this quantity. 


* R. J. Thorn and G. H. Winslow, J. Phys. Chem., 65, 1297 (1961). 
t R. T. Birge, Phys. Rev., 48, 918 (1935). 

j J. W. M. DuMond and E. R. Cohen, Rev. Modern Phys., 20, 82 (1948). The 
determination of the best values for the fundamental physical constants was 
discussed more recently by E. R. Cohen, J. W. M. DuMond, T. W. Layton, 
and J. S. Rollett in Rev. Modern Phys., 27, 363 (1955). 
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When two averages are found to be inconsistent by the above criterion, 
one should ignore the individual <r’s and consider the two averages A\ 
and A 2 to be of equal weight, as was suggested above for the case of several 
averages. The best value is then the arithmetic mean, A = %(A X + A 2 ), 
and the <r 0 for this mean is given by 



Since 


and 


Vl = 


Ai — A = 


A\ — A 2 


2 



we find that 

S> 2 = v\ + vl = 2 (- 2 * 1 - - 

Therefore (T 0 = %\A X — A 2 \. 


11-6 INTERNAL AND EXTERNAL CONSISTENCY 
METHODS FOR CALCULATING STANDARD 
DEVIATIONS FOR MORE THAN ONE UNKNOWN 

In Section 10-3 (supplemented by Appendix 4), methods were given for 
determining the weights of the unknown quantities which were calculated 
by the method of least squares using observation equations. The discussion 
included the possibility of using weighted observations, that is, using two 
contributions to the weights of the unknown quantities. One of these is 
unequally weighted observations, represented by w in such sums of 
products as [waM] ; the other contribution is the actual distribution of 
the data. For example, in a straight-line case where M = A + Bb, if 
B and b are such that all the observations of M are far from A, the weight 
of A will be reduced as compared with the case of observations which are 
made near b equal to zero. The determination of these latter weights 
is a separate problem from the determination of the standard deviation of 
an observation of unit weight, and, in fact, is the problem discussed in 
Section 10-3 and Appendix 4. 

If many observations of M are taken at each of several values of b, 
they can be averaged, group by group, and standard deviations deter- 
mined for each of these several values of M. Weights can then be assigned 
to each of these values of M which are inversely proportional to the squares 
of their standard deviations, the unit weight being assigned to the value 
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which has the largest standard deviation. These weights are the values of 
w in equations such as (10-16), and the largest of the several standard 
deviations is the standard deviation of an observation of unit weight when 
errors in A and B are to be calculated on the basis of internal consistency. 

If, on the other hand, errors in A and B are to be calculated on the 
basis of external consistency, the standard deviation in an observation of 
unit weight is to be calculated from Eq. (10-26), and the values of w to 
be used in this latter equation are still those described above. 

A complete solution of the two-parameter example is then given by the 
following equations: 


A 

B 

A 


w A 

w B 

<?e 

<Tb 


wM] 

[wb] 

[wbM] 

[wbb] 

[w] 

[wM] 

[wb] 

[wbM] 

M 

[wb] 

[wb] 

[wbb] 

A /[wbb], 

A/M, 

[wvv]/ ( n 

~ 2), 


^max; 

(<T e or <Ti)/Vw A , 
(a e or <Ti)/VwB. 


(11-24) 

(11-25) 

(11-26) 

(11-27) 

(11-28) 

(11-29) 

(11-30) 

(11-31) 

(11-32) 


Equations (11-24) through (11-26) follow from Eqs. (10-16). Equations 
(11-27) and (11-28) follow from Eqs. (11-19) and Appendix 4. In Eqs. 
(11-29) and (11-30) the subscripts e and i on the left-hand side designate, 
respectively, external and internal. Equations (11-31) and (11-32) follow 
from Eq. (10-14). 

In Eq. (11-29) the residuals v are the differences between the observed 
values of M and those calculated with the values of A and B as given by 
Eqs. (11-24) and (11-25) respectively, and n is the number of values of 
M. In Eq. (11-30) <r max is the largest of the errors associated with the 
individual values of M . The errors a e and <r* are the standard deviations 
of an observation of unit weight for these two consistency assumptions. 
It is not difficult to show that when the weight of the observation M k is 
written as 


w k = C/at, 


(11-33) 
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the result of Eq. (11-29) is independent of the value of the constant of 
proportionality C. On the other hand, Eq. (11-30) implies that C is 
°max- This choice is made because it is the most convenient; as a result, 
the weights will generally be of a reasonable size regardless of the size 
of the o-jfc’s, and an observation of unit weight is brought into evidence. 
Perhaps it should be added that if C in Eq. (11-33) is set equal to unity, 
then Gi is also unity, and Eqs. (11-27) and (11-28) are directly the reci- 
procals of the squares of the standard errors in A and B, respectively. 

Let us investigate the following example. It is especially illustrative 
because it is real rather than made up and because it differs in one im- 
portant respect from the case described above rather than in spite of it. 
The difference is that the authors of the source of the example did not 
make many observations of M at a single value of b. Instead, they 
attempted to estimate standard deviations in the various quantities 
which entered into the calculation of the individual values of M. One 
might say that an attempt was made to estimate any further systematic 
errors which presumably could have been removed if they were sought and 
found. It will be seen that, fortuitously or not, the attempt appears to 
have been reasonably successful. 

Table 11-2 

GRAPHITE VAPOR PRESSURE DATA 


Experiment 1 

Experiment 2 

10 4 /T 

—log p* 

10 4 /T 

— log p* 

4.1339 

4.641 ± 0.019 

4.0634 

4.460 ± 0.030 

4.0437 

4.323 ± 0.024 

4.0437 

4.381 ± 0.032 

4.0290 

4.266 ± 0.026 

4.0000 

4.231 ± 0.038 

4.0258 

4.256 ± 0.026 

3.9777 

4.159 ± 0.042 


* The values of M = — log pressure used for our present purposes were not 
properly weighted to maintain the proper weights of the directly observed 
quantities (see Section 10-7); this is a complication which does not contribute 
to the matter being illustrated. 


The example consists in the measurements of the vapor pressure of 
graphite by condensing a known fraction of the vapor emitted during a 
known time interval from a source at a known temperature, f The values 
to be used here are given in Table 11-2. Two different experiments were 


f Argonne National Laboratory Report ANL-4264, The Vapor Pressure and 
Heat of Sublimation of Graphite , by O. C. Simpson, R. J. Thorn, and G. H. 
Winslow (1949, unpublished). These authors did not put straight lines through 
the particular data which are being used for the present example. 
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fig. 11-2. The “error bars,” of total length twice the standard deviation given 
for each point in Table 11-2 represent the effects of the standard deviations 
estimated by the observers to be applicable to auxiliary measurements. It is 
clear that the precision is great enough that it alone would not prevent the 
discovery of the source of the systematic difference. 

done, and we give the values of the negative logarithm of the pressure (M ) 
in mm Hg for various values of reciprocal absolute temperature ( b ) for both. 

The results of treating these data according to the methods outlined in 
this section are given in Table 11-3. The errors obtained with the external 
consistency calculation are small compared with the internal consistency 
calculation; this shows that the random error in the experiment is smaller 
than the estimated errors in the measurement of auxiliary quantities, such 
as distances from source to target, used to compute p from the observed 
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Table 11-3 

COMPARISON OF CONSISTENCY ASSUMPTIONS 


Consistency Assumed 

Experiment 1 

Experiment 2 

A 

B 

A 

B 

Internal 

-10.07 

±0.96 

3.56 

±0.24 

-9.8 

±2.1 

3.51 

±0.53 

External 

-10.066 

3.558 

—9.82 

3.512 

±0.095 

±0.023 

±0.41 

±0.103 


rate of deposition of the graphite vapor. Intercomparison of the numbers 
in Table 11-3 and the graphic comparison in Fig. 11-2 indicate that the 
rate of change of p with temperature was determined more precisely than 
the absolute value of p at any one temperature. That is, the slopes of 
the two lines agree quite well; it is the intercepts which are in disagreement. 
In the temperature region in which the data were taken, the lines are 
separated by amounts comparable to the estimated errors in In p. This 
does not prove that there was no unrecognized systematic difference 
between the two experiments, but the odds favor the argument that 
better measurements of the auxiliary quantities would remove the 
systematic difference. 


11-7 REJECTION OF OBSERVATIONS: MORE THAN 
ONE UNKNOWN 

Table VI in the report* from which the example of the previous section 
was taken provides an example in which we can examine the process of 
looking for excessively large residuals when there is more than one quantity 
to be determined by the method of least squares. The pertinent data from 
that table are reproduced here in the first three columns of Table 11-4. We 
extend the procedure outlined in Section 10-10 to this case by using the 
criterion with which we calculate the probability of occurrence of a 
residual of a particular size from the size of the sample. The present 
example is a straight-line case; two unknowns are to be determined. It is 
assumed that the observations are externally consistent, and the procedure 
amounts to an examination of the validity of that assumption. 

The progress of the examination is shown in Table 11-4. The cr calcu- 
lated for each trial is that of Eq. (11-29). After the first trial it is found 
that for Run 1, 0.0753 > 0.0722; this run is discarded. After the second 
trial it is found that for Run 2, 0.0576 > 0.0546; this run is also discarded. 


* O. C. Simpson, R. J. Thorn, and G. H. Winslow, loc. cit. 
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Table 11-4 


Run 

Vii 

10 4 /T 

Vl 

V2 


V3 

12 

2.425 

3.973 

—0.0426 

—0.0245 


—0.0181 

11 

2.520 

4.000 

—0.0250 

—0.0085 


-0.0024 

1 

2.453 

4.008 

+0.0753 




10 

2.685 

4.044 

—0.0066 

+0.0074 


+0.0131 

3 

2.787 

4.063 

-0.0294 

—0.0165 


-0.0110 

9 

2.964 

4.108 

-0.0188 

—0.0085 


-0.0035 

2 

3.087 

4.154 

+0.0500 

+0.0576 



8 

3.262 

4.186 

+0.0084 

+0.0141 


+0.0184 

4 

3.427 

4.223 

-0.0024 

+0.0013 


+0.0052 

7 

3.583 

4.266 

+0.0209 

+0.0220 


+0.0255 

5 

3.703 

4.294 

+0.0176 

+0.0171 


+0.0204 

15 

3.824 

4.316 

-0.0117 

—0.0134 


-0.0104 

6 

3.863 

4.333 

+0.0202 

+0.0175 


+0.0203 

14 

3.972 

4.350 

—0.0180 

-0.0216 


-0.0189 

13 

4.054 

4.365 

—0.0374 

—0.0420 


-0.0394 

First trial: yn — - 

-14.1798 + 4.1687 (10 4 /T); <r 

= 0.0339 





2 X 15 
4 X 

— — = 0.4833 
15 

X nin 

; - - 2.13; 

X 

= 0.0722 

Second trial: yn = 

-13.9325 + 4.110 (10 4 /!F); <r 

= 0.0260 




(2 X 14 - l)/(4 X 14) = 0.4821; 

X/a = 2.10; 

X 

= 0.0546 

Third trial :yn — - 

-13.8876 + 4.1013 (10 4 /!F); a 

= 0.0202 




(2 X 13 - l)/(4 X 13) = 0.4808; 

X/<r = 2.07; 

X 

= 0.0418 

Conclusion: yn = 

-13.888 ± 0.174+ (4.101 ± 0.041) (10 4 /r) 




After the third calculation it is found that there is no residual with an 
absolute value greater than 0.0418. At this last trial, we see that when a 
residual is so large that its probability of occurrence is less than 3.8%, we 
can assume that its existence invalidates the assumption of external 
consistency. This is more stringent than the 5% level which is often used 
for this sort of consideration. The 5% level occurs at P x = 0.475, X/cr = 
1.96, X = 0.0396 for the third trial. While there is also no residual 
greater than this, the one at Run 13 is essentially equal to it. 


PROBLEMS 

1. Multiply 630.45 ± 0.62 by 25.635 ± 0.024 and give the complete result. 
Answer: 16162 ± 22 

2. Add 21.42 ± 0.61, 338.161 ± 0.042, and 543.1 ±1.5 and give the com- 
plete result. 

Answer: 902.7 ±1.6 
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3. The dimensions of a right circular cylinder are found to be: 

length 12.183 ± 0.024 cm 

diameter 4.242 ± 0.021 cm 

where the errors are standard deviations. Find the volume and its standard 
deviation. 

Answer: 172.2 ± 1.7 cm 3 

4. A rectangular steel rod of width b and depth d is supported at its ends 
and loaded at its center by a weight W. If the length of the rod between its 
supports is l, and a is the deflection at the center, then 

Wl 3 

a ~ 4Ebd 2 ’ 

where E is the modulus of elasticity. Measurements give b = 8.113 ± 
0.042 mm, d = 10.50 ± 0.025 mm, l = 1.000 m precise to 1 in 5000, 
a = 2.622 mm precise to 0.25%, W = 2 kg precise to 0.02 g. 

(a) Compute the modulus of elasticity. 

(b) Assume that each measure of spread is a standard deviation and 
calculate separately the percent spread each would produce in E, if the 
other spreads were negligible. 

(c) Calculate the standard deviation in E due to all components. 

Answer: (a) 2.102 X 10 9 gm/cm 2 ; 

(b) W, 10- 3 %; l,6X 10- 2 %; a, 0.25%; b, 0.52%; d, 0.71%; 

(c) 1.9 X 10 7 gm/cm 2 

5. Show that the results of Eqs. (11-24) through (11-29) are obtained cor- 
rectly if each original observation equation is multiplied through by the 
square root of its weight and the set is then treated as being equally 
weighted. Is this true for the general linear problem? 

6. For ( y , x)-data which satisfy 

y = ax + bx 2 , 

where the observed values of y are equally weighted and the values of x 
are known exactly, show that the equation can be viewed as the result 
of applying the procedure of Problem 5 to data of the form 

( y/x ) = a + bx, weight x 2 . 

7. The quantities n and k, to be combined into 

Ank 

y = (w+2)2 J 

are found to have the values 2.00 ± 0.04 and 0.250 ± 0.005, respectively, 
where the errors are the standard deviations. 

(a) Without introducing any constant of proportionality, find the product 
of y and the square root of its weight. 
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(b) If A = 2.00 exactly, find the standard deviation in y. 

Answer: (a) 25.8 (b) 1.08 X 10 -3 

8. The specific gravity of zinc sulfate solutions were found to be: 

sp. gr. % ZnS(>4 

1.020 2 

1.040 4 

1.060 6 

1.087 8 

1.110 10 

1.129 12 

1.155 14 

1.185 16 

(a) Use y = sp. gr. —1 and plot the data in order to judge whether it is 
satisfactory to consider that the specific gravity is a linear function of the 
composition. 

(b) Finding this unsatisfactory, evaluate a and b in y = ax ~\~ bx 2 where 
x is the percent of ZnS 04 . 

(c) Evaluate cr e and consider whether any of the points should be rejected. 

(d) Evaluate <x a , <r&, <r y (l), <r y (9). 

(e) Plot the curve derived in part (b) on the graph made in part (a). 
Show the spread at x = 1 and at x = 9 with error bars of total length 
2<r y as in Fig. 11-2. 

Answer : a = (9.63 ± 0.33) X 10~ 3 , b = (1.16 db 0.25) X 10" 4 , <r y (l) = 
3.1 X 10^*, a y { 9) = 1.25 X 10“*. No points can be rejected. 

9. Verify Table 11-3 from the data of Table 11-2. Make the initial calcula- 
tions of A and B to four decimal places. 

10. In a particular experiment, ( y , x)-data is expected to obey y = A + Bx, 
and the best values of A and B are desired. It is assumed that x is known 
more exactly than y. Measurements of y at some fixed value of x showed 
that a standard deviation of ± 0.15 could be expected in a single observa- 
tion. The following data are then taken: 


x y 


0.40 

2.50 

1.00 

4.95 

1.50 

6.50 

2.20 

7.00 

2.60 

8.75 

3.00 

10.00 


Find the best values of A and B and their standard deviations. 

Answer: Solution according to the experimenter’s assumption shows that 
a serious mistake was made; cr e is about four times 0.15. Recalculation with 
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y as the exactly known variable gives 

A - 1.71 ± 0.69, B = 2.75 db 0.27. 

Note that an initial plot of the data, with error bars shown on y , would 
have revealed the mistake immediately. 

11. Measurements of the vapor pressure of uranium dioxide were made by 
heating in a good vacuum a sample contained in a cylindrical tungsten 
crucible having a small circular orifice of radius r in the center of its lid. 
Part of the vapor streaming out the orifice was condensed on a circular 
target of radius R at a distance d vertically above the orifice and concentric 
with it. If W is the number of grams condensed on the target in t seconds 
when the temperature of the sample is T degrees Kelvin, the pressure in 
atmospheres is given by 

—3 W\/T 
p = 1.3726 X 10 3 —p— , 

ixt 

where G, called the geometry factor, is 

2 d 2 
Trr R 

G ~ d 2 R 2 

The construction is such that d remains constant as the temperature is 
raised. It can be assumed that t and T are known exactly. Prior study 
shows that one can take 10 Mg as the standard deviation in an observation 
of W. The following measurements pertaining to the geometry are made 
at room temperature: 


d = 11.286 ± 0.008 cm, 
R = 0.9507 ± 0.0004 cm, 
r = 0.0716 db 0.009 cm. 


The following data are then taken : 


T (°K) t (sec) W (Mg) 


2617 180 252 
2688 120 314 
2720 120 595 
2778 60 544 
2809 30 370 


The object of the experiment is to determine the dependence of p on T; 
the principal terms in such a relation are given by 

log V = A + ^ i 

but it is frequently necessary to include a term in log T or in T~ 2 . It is 
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desired to exert proper control of the experiment by comparing the internal 
and external consistencies of the results: (Note. This problem has been 
adapted from R. J. Ackermann, The High Temperature, High Vacuum 
Vaporization and Thermodynamic Properties of Uranium Dioxide, Argonne 
National Laboratory Report ANL-5482. The final least-squares results 
are not asked for, though that could be done; that calculation is adequately 
illustrated in other problems. The purpose here is to illustrate the steps 
which must precede, in a real laboratory situation, the solution of normal 
equations.) 

(a) What possible systematic source of error of the sort referred to in 
Section 5-1 as a theoretical error must be examined to see if it need be 
considered? 

(b) Examine the separate sources of error in G. Need they all be included? 

(c) Can any parts of the total problem be done by careful slide-rule work, 
and if so, what parts? 

(d) Should correction be made for the possible systematic error referred 
to in (a)? If so, how carefully need it be done? 

(e) Examine the separate sources of error in p. Need they all be included? 

(f) Compute the values of log p and plot them against ltifi/T. 

(g) Compute the individual errors in log p and show them as error bars 
on the graph made in (f). [See part (e) of Problem 11-8.] 

(h) Does a third term in the equation for log p seem called for? 

(i) Does there appear to be reasonable consistency between the size 
of the error bars and the scatter in log p? 

Answer: (a) Increase in area of the orifice at the high temperatures, (b) No, 
only the error in r need be included, (c) Everything can be done by careful 
slide-rule work except the determinations of T~ l and the solution of the 
normal equations. The numerical coefficient in the equation for p is then 
converted to 1.373 X 10 -3 . (d) Since the precision error in r is not large 
compared with the change in r by expansion, correction must be made for 
the latter, but it need only be made by using the average temperature of 
the experiment. (The authors used 4.6 X 10 -6 as the thermal expansion 
coefficient of tungsten, 2700 °K as the average temperature of the experi- 
ment, and 300 °K as room temperature.) (e) The fractional errors in W 
and G are of comparable size, and both must be included, (f) and (g) 
Typically, log p (atm) = 4.07 dfc 0.020 at 10 4 /T = 3.821. (h) Yes. (i) 
Yes. 

12. In the Behr free fall experiment where timing is done with a tuning fork 
(see Section 10.5) the locations of the peaks of the trace can be chosen 
much more precisely when the fork is dropping slowly than when it is 
dropping rapidly near the end of its fall, where the amplitude of vibration 
also has become small. In order to estimate the weights of the various 
observations, consider the equation of the trace to be 

y = yo e~ kt sin wt, 

with 

t = V2x/g. 
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Here w is given by the frequency of the fork, A: is a measure of the rate of 
decrease of amplitude of the fork, x is the distance it has fallen, and g is 
the acceleration due to gravity. By supposing that it is necessary to be 
able to detect some fixed change in the deflection of the fork from a peak 
deflection, show that the error in x at each peak measured is proportional 
to Vx. With a fork of frequency 150 vib/sec and observations of x taken 
every 8 complete vibrations starting from an arbitrarily chosen zero, the 
following observations are made: 


Observation number x (cm) 


0 1.4 

1 5.4 

2 12.8 

3 21.5 

4 35.0 

5 49.2 


(a) Using an observation equation of the form 

X = X 0 + vot + \gt 2 , 

find the properly weighted value of g, and its standard deviation indicated 
by these observations. 

(b) Does the size of the standard deviation reflect the precision or the 
accuracy of the experiment? 

(c) Considering the value of g obtained, the size of its standard deviation, 
and that the known value of g is ~981 cm/sec 2 , would it be sensible to 
include a frictional drag in the statement of the observation equation in 
order to get a better value for g, or would it be more sensible to use some 
other experimental arrangement? 

Answer: (a) g = 962 dt 48 cm/sec 2 (b) precision (c) Clearly frictional 
drag represents a systematic error. The inclusion of a term for it in the 
observation equation would improve the accuracy, but the precision is too 
poor to warrant doing this. An improved experimental arrangement is 
called for. 

(Note. Actually, the precision of this experiment is much better than 
the made-up data here would indicate. The problem was put together in 
this way to combine an illustration of an approach to the problem of deter- 
mining weights of observations and of the sort of considerations made in 
part (c).) 
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INTRODUCTION TO 
STATISTICAL ANALYSIS 


Reference was made more than once in the previous chapters to an im- 
portant difference between the situations in the broad field of error analysis 
in which the physical scientist finds himself and those in which the 
biological, medical, or social scientists generally find themselves. We 
emphasized the difference in the relative sizes of the standard deviations 
in the variety of measurements made by these two groups. The com- 
parative sharpness of his error distribution curves has enabled the physical 
scientist to make tremendous progress with small samples and unsophisti- 
cated methods of error analysis. 

This, however, is not the only difference. Of equally great importance 
is the fact that investigations in the physical sciences usually involve a 
known relationship or a well-defined proposed relationship between groups 
of experimental variables and fixed parameters. The object of the in- 
vestigation is usually to determine the values of the parameters. We can 
often safely assume that all but one of the experimental variables are 
known with practically absolute certainty. That is, we can assume that 
all those uncontrolled factors which produce random departures from 
exact agreement with the relationship affect only the one variable. An 
example is the vapor pressure of graphite as a function of temperature 
which we discussed in Sections 11-6 and 11-7. The error distribution is 
one dimensional. 

If, on the other hand, one proposed to investigate the relationship, 
which is only vaguely definable, between the height and weight of 25-year- 
old men, for example, it is clearly impossible to make any such assumption. 
While one might expect a tendency toward an association of large weight 
and large height, and vice versa, there is no basis on which one could 
assign the status of independent variable to one variable, the status of 
dependent variable to the other, and then make an error analysis only of 
the latter. The uncontrolled factors such as frame structure, diet, amount 
and nature of 25 years of exercise, degree of normality of glandular func- 
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tion, and so on, affect both the height and weight in ways that could be 
studied separately. But in the restricted context proposed above, they 
are completely unknown. Thus one would have to treat both the height 
and weight as random variables. The “error” distribution would be two 
dimensional. 

Faced with problems of this sort, and with large standard deviations, 
the biological and social scientists have been forced to do much more 
than merely use large samples. They have to help themselves also by the 
development of much more erudite and, of course, complex methods of 
analysis than we have discussed so far in this book. A natural by-product 
of this development is the growing realization by the physical scientist of 
the usefulness of such methods. He has begun to wonder, to refer to a 
previous example, whether there is any chance that he has fooled himself 
by not considering the temperature as a random variable. The intent of 
this chapter then is to provide an elementary introduction to some of these 
more advanced methods. 

Most of our discussion will be concerned with more detailed investiga- 
tions of the properties of one-dimensional distributions. Our examination 
of the expectation value of the range in Section 10-11 and of the methods 
of internal vs. external consistency in Sections 11-5 and 11-6 were quali- 
tative or approximate approaches to the types of analysis which we now 
wish to make more precise. Briefly, we wish to put on a sounder numerical 
basis the problem of deciding whether two or more alternative results of 
experimentation or observation are definitely different, or whether the 
differences can be explained in terms of random or chance errors of 
observation. 

It is, of course, impossible to give an absolute answer to such a question. 
It must be realized that statistical analysis is not like plane geometry or 
counting. In a discipline of the latter kind one knows, for instance, that 
501 is different from 502. In statistics, on the other hand, the questions 
are, “When we take into consideration all we know about the sources of 
these numbers, what is the probability that the differences are due merely 
to chance?” and, “When we consider the consequences, i.e., count the cost, 
of an erroneous decision, at what level of probability are we sufficiently 
convinced of the validity of a result so that we will base our future actions 
on it?” 

Before proceeding with the principal subject matter of the chapter, it 
will be convenient to discuss an extension of an earlier topic. In Sections 
6-3 through 6-6 we considered the performance of an experiment such 
as the tossing of n coins or the rolling of n dice simultaneously. We arrived 
at the binomial distribution. We will now consider some other aspects of 
this problem to prepare ourselves for some of the procedures we shall 
develop later in the chapter. 
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12-1 EXTENSION TO N TRIALS 

Suppose that we designate as a success some particular result of an experi- 
ment with a single simple element, such as a die. This result could be the 
appearance of any particular face, or of either of two particular faces. 
Let the probability of this success be called p. The probability of failure, 
1 — p, will be called q. We found that if we made a simultaneous trial 
with n such elements, the probability of achieving r successes where r 
is some number between 0 and n is given by 

<P(n, r) = C(n, r)p r q n ~ r , (12-1) 

in which C(n, r) are the coefficients in the binomial expansion of (p -f q) n . 
We also found the expectation value and standard deviation to be given 
by p = up and a = y/npq respectively. 

The appearance of the multiplicative factor C(n, r ) means that it does 
not matter which element or elements produce success in a toss. Thus 
the problem remains unchanged if, instead of trying n elements simul- 
taneously, we try a single element n times, except that we will now say 
that it does not matter which of the n successive trials result in successes. 
After the completion of n trials, we will find a probability distribution for 
the number r of successes, and this probability distribution is given by 
Eq. (12-1). 

We can use the same line of argument for N trials each of which is the 
simultaneous trial of n elements. Again we will define success to suit our 
purposes of the moment; there is no absolute definition of success in this 
field. Suppose that success in one of the N trials is the appearance of some 
fixed number, say r, of a particular result in a single trial of the n elements. 
It might be the appearance of two fives in a single toss of 10 dice. Keeping 
in mind that in this experiment a single elementary trial involves n 
elements, we see that the probability of success in a single trial is the (P (n, r) 
given by Eq. (12-1). It does not matter which of the N trials result in 
success. Consequently, the probability that R of the N trials will result 
in success is given by 

<?(N, R ) = C(N, R)[<P(n, r)]*[l - <P(», r)] N ~ R . (12-2) 

Furthermore, by the same arguments as used previously, we find that the 
expectation value is 

H — N (P(n, r) (12-3) 


and the standard deviation is 


<t = VJWn, r)[l — (P (n, r)]. 


(12-4) 
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Equation (12-3) agrees with our intuitive notion that the expected 
number of successes is just the number of trials multiplied by the 
probability of success in a single trial. The arguments used in Section 6-6 
also apply here, however. One could hope to demonstrate the result with 
certainty only by the impossible procedure of approaching an infinite 
number of trials. Such an imaginary procedure, however, would merely 
demonstrate the validity of the primary distribution (P(n, r). 

Our choice of using the binomial distribution as the point of departure 
to arrive at conclusions for N trials of an experiment was merely a matter 
of convenience. The argument was clearly independent of that choice; 
it would have proceeded in exactly the same way if we had used some 
other probability function than (P(w, r). The latter could be replaced by 
some term in a Poisson distribution, or even by the integral over some 
particular interval of a continuous distribution, such as the normal distri- 
bution, without altering the result. In other words, when one performs N 
trials of an experiment in which the probability of success in a single trial 
is p and that of failure is consequently 1 — p, he is performing a binomial 
experiment, regardless of the form of p. The expectation value is Np, 
and the standard deviation is y/Np( 1 — p). 

Let us emphasize these remarks particularly for the case where p is 
drawn from a Poisson distribution. The distribution of the results of N 
trials of an experiment governed by a Poisson distribution is not itself a 
Poisson distribution. The value of N will usually be large but finite, and 
that of p small but also finite. As an example, suppose that for the 
Poisson distribution of Table 6-5 we made 10 4 observations to determine 
the expected number of one-sec intervals in which six counts would 
appear. We would compare the result with an expectation value of 
10 4 X 0.1575, or 1575 such intervals, having a standard deviation of 
V10 4 X 0.1575 X 0.8425, or 36.4 intervals. 

We mentioned earlier that the outcome of an imagined infinite number 
of trials would demonstrate the form of the distribution which describes 
the probability of each individual result. In a subsequent section we shall 
develop a procedure which will enable one to use the results of Eqs. (12-3) 
and (12-4) with a finite number of trials in order to assess numerically 
the probability of his observational results’ being indeed governed by the 
distribution that he has proposed for them. 


12-2 RADIOACTIVE COUNTING 

Our extension to N trials is specially suited to radioactive counting because 
of the nature of the probability of success involved in a single trial of one 
element in such a problem. For this reason, we shall examine counting 
experiments with a little more care at this point before going further. 
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In Section 6-6 we considered as an example the decay of a £ mg of 
92 U 238 . It was pointed out that the probability of decay of a single atom, 
i.e., the probability of success in a single trial of one element, was about 
5 X 10~ 18 per sec. Thus, in that section, a single trial consisted in the 
observation for one sec. It is a quantum mechanical result that if an 
atom is known to be 92U 238 , i.e., undecayed, at a particular time, then the 
probability that it will decay subsequently increases linearly with the 
time. We have the alternatives then of regarding such an experiment as 
N trials of one-sec duration each, for each of which p = 5 X 10 -18 , or as 
a single trial of N - sec duration for which p = N X 5 X 10” 18 , or as any- 
thing between these two extremes. As is so often the case, the choice 
depends on the purposes of the experiment. 

If the duration of a trial is short, obviously there will be considerable 
fluctuation in the number of counts per trial, i.e., per time interval. We 
would use a large number of relatively short intervals if we wanted to 
test whether the decay rate follows the Poisson distribution. On the 
other hand, when the object is the more common one of determining the 
probability of decay per sec, the best and simplest procedure is to use a 
single long interval. 

We will continue using the numerical values of the example in Section 6-6, 
writing results as though the expected values of p and a had been observed 
experimentally. Suppose that a single trial consists in the observation of 
the 1.3 X 10 18 atoms for 10 4 sec. Then N = 1 and p = 5 X 10 -14 . 
The Poisson conditions are still satisfied; the expectation value, however, 
is now 6.5 X 10 4 with a standard deviation of \/6.5 X 10 4 , or 255 counts 
per interval of 10 4 sec. When the intervals are of one-sec duration, we see 
from Section 6-6 that the observation is 6.5 ± 2.55 counts per sec.* The 
present result is 6.5 X 10 4 ± 255 counts per 10 4 sec or 6.5 ± 0.0255 
counts per sec. 

Furthermore, with the expectation value as large as 6.5 X 10 4 — much 
smaller values would suffice for this — the situation is as described in 
Section 8-3. That is, if we took observations for several, say M, 10 4 -sec 
intervals, we would expect them to follow a normal distribution with a 
mean of 6.5 X 10 4 and a standard deviation of 255. The difference ^ 
between the mean and the most probable value, which we found in 
Section 8-3, is of no significance. It follows from Eq. (10-15), derived 
for a normal distribution, that the average of M observations, each for a 
10 4 -sec interval, would have a standard deviation of 255 /y/M. We see 
then that when the counts are given on a per sec basis and all the readings 


* Here it is convenient to write the result in this way, but the reader should 
recall the discussions in Section 9-4 relative to the meaning and usefulness of 
the standard deviation for asymmetric and noncontinuous distributions. 
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are used, it is immaterial whether we look at these M observations as 
described, or as a single observation for an interval of 10 4 ilf sec. This 
conclusion is most easily described by a table (see Table 12-1). 


Table 18-1 


Interval 

(sec) 

No. of 
obser- 
vations 

y (sec) -1 

a 

(sec) -1 

vo 

(sec) -1 

count/sec 

10 4 

M 

6.5 X 10 4 

255 

255 /VM 

6.5 =fc 0.0255/VM 

10 4 il/ 

1 

6.5 X MX 10 4 

255 VM 

25 WM 

6.5 ± 0.0255/\/F 


The extension of this discussion to other decay rates, masses of material, 
lengths of interval, and numbers of observations is obvious. It must be 
remembered, however, that the whole discussion hinged on the assumption 
that the value of n, the number of basic elements, is unchanged during 
the progress of this experiment. No matter how the total number of 
counts involved is partitioned, this number must be small compared 
with the total number of atoms present if the preceding discussion is to 
be applicable. 

12-3 MULTINOMIAL DISTRIBUTION 

When we considered the rolling of n dice in Section 6-6, our attention was 
centered on the probable number of dice showing a particular face without 
regard to what faces the other dice showed. That is, we found that in a 
rolling of n dice, the probability of obtaining r aces, for instance, is 

C(n, 7*)(i) r (f) n—r , 

regardless of what the other n — r dice show so long as they are not 
aces. There are times, however, when one is concerned with what other 
faces, or, depending on the problem, the analogy to “other faces,” turn 
up along with these r aces. 

In the case of a good die, the probability that any one face appear 
as the result of a toss is the same, In the following discussion, however, 
while it is helpful to keep the dice in mind as a framework on which to 
hang the argument, we shall distinguish the probabilities of occurrence 
of different elementary results from one another so that our conclusions 
will have more general applicability than just to the throwing of good 
dice. 

Suppose there is an elementary event, such as the rolling of a single 
die, which can produce any one of k results, and suppose also that the 
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probability of the tth result is Pi. Then 

k 

2 Pi = 1- 
1=1 

In the case of a die, for instance, all the values of pi are £, and there are 
six of them. 

Now, suppose we cause the successive occurrence of n of these events.* 
Let rii be the number of results of the ith type, which has elementary 
probability p*. Then 

k 

n * = U > 

1=1 

and our object is to determine the probability of this particular distribution 
of the values of n*. 

It is evident that the probability that the first event will produce the 
first type of result is pi, that the second event will produce the first type 
of result is p\, and so on up to the r^th event, that the (ni + l)th event 
produce the second type of result is p 2) and so on through the nth event. 
The joint probability of these n results is then 



i—1 


In fact, this is also the joint probability of the n results, distributed as to 
type according to the specified values of n t -, for any order of appearance. 

Our interest, however, is not in any particular order of occurrence; we 
wish to know the probability of getting the specified distribution of the 
values of n» regardless of the order in which they occur. Thus we need 
here, as in the binomial distribution, a factor which corresponds to the 
C(n } r ) of the latter. We can make use of the knowledge gained in the 
discussion of C(n, r) in Section 6-5 to arrive at the present factor. The 
starting point is Eq. (6-3), which gives the number of combinations of n 
things taken r at a time. It is to be remembered that this expression is 
based on the fact that the r things can be any r of the group of n things, 
and that any particular r things can be picked in any order. 

Then, if we are only interested in the number n 1} we will find the 
necessary multiplicative factor to be 

w! 

ni\(n — ni)! 

* Following the discussions of Section 12-1, the reader will understand that n 
successive trials with one element, followed by appropriate sorting of the results 
into groups, is equivalent to the simultaneous use of n elements. 
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For any one choice of n x things, we can choose n 2 in 

(ft — wi)! 

w 2 !(w — Wi — n 2 )! 

ways since, after having chosen n X) there are only (n — nj) things from 
which to pick n 2 . We can repeat this argument for all such factors, the 
last of which is 

In - mV. 
nk\ 0! 

where, as was pointed out in Section 6-5, 0! = 1. 

Since each of these factors is the number of choices of a particular type 
for any single choice of the types preceeding it, the total number of ways 
of picking n x things of a first type, n 2 things of a second type, and so on 
up to nk things of a kth type, out of a total of n things is the product of 
all these. Thus the desired multiplicative factor is 

n\ 

n i! n 2 ! • • • n*! 


and the multinomial distribution is 


<P(n», Vi) 


n\ 


n x \ n 2 ! • • • n k \ 


, Pt'pp 


pi i 


(12-5) 


12-4 THE RANGE 

In Section 10-11 we gave an approximate derivation for the expectation 
value of the range, the range being the difference between the largest 
and the smallest readings in a set. In that derivation it was supposed that 
the individual readings were normally distributed, but it was pointed 
out that the range itself has a distribution which is not normal. In the 
present section we shall begin the examination of several quantities which 
fit this same general description ; they are all descriptive of some aspect of 
a finite group of readings drawn from a normal distribution, but the quan- 
tities themselves are variables which are not normally distributed. 

This kind of interplay between different forms of distributions appeared 
in Section 12-1; it was shown that the probability of R successes or, 
equivalently, of R particular results out of N trials is binomially dis- 
tributed regardless of the form of the distribution which determines the 
probability of occurrence of that particular result in a single trial. 

A proper discussion of the distribution of the range must begin with an 
extension of this idea to the multinomial distribution. While the discussion 
will proceed on the assumption that the probabilities of occurrence of 
individual events in a single trial are given by the normal distribution, 
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the reader should remember that this assumption is not essential to the 
argument; we use it merely because this is the most used and tabulated 
case. 

One part of the argument, however, will be peculiar to using continuous 
distributions as the source of the probabilities of individual events. With 
a continuous distribution, as illustrated in Section 9-5, we only get a 
finite probability by integrating the distribution function over some 
interval of the variable which is the argument of the function. Such an 
interval is arbitrary and changeable for different applications. Hence we 
will find a distribution function for the range, which we will use in the 
same way as we used the normal function. To find the probability that 
the range lies within a certain interval, we must integrate the distribution 
function over that interval. The procedure which we will use to arrive 
at the function is a limiting process similar to that used in the definition 
of the derivative. 

The point of departure can be made crystal clear by a statement which 
is almost foolishly obvious. Among the finite number of readings in hand 
there must be a greatest and a smallest, and all the others must lie between 
these two. Thus we must determine the probability that all our observa- 
tions lie within specified bounds. To do so, we first divide the whole 
interval in which the readings can fall into five sections. The first section 
extends from — oo to x — (Ax) /2, where x is the smallest reading; the 
second extends from x — (Ax) /2 to x + (Ax) /2 ; the third from x + (Ax) /2 
to y — (Ay)/2, where y is the greatest reading; the fourth from y — (Ay)/2 
to y + (Ay)/ 2; and the fifth extends from y + (Ay)/ 2 to +oo. 

For a normal distribution with expectation value /z and standard 
deviation <r we can write the distribution function as 

m = ^ exp [~ (12 ' 6) 


if we use Eq. (9-4) in Eq. (8-12) and if we agree to let x now represent a 
reading; the real error then would be (x — /jl). The probabilities that a 
reading will occur in the intervals described above are then 


Pi = 


rx—(Ax)/2 

~7== / exp 

oV27T J -<*> 


*--)=/ 


x+(Ax)/2 


<r\/2ir J x— (Ax)/2 


exp — 


[- 

[- 


etc. 


We must next write down the probability that, of the n readings, none 
will fall in the first or fifth intervals, one each in the second and fourth 
intervals, and n — 2 in the third interval, where the order in which the 



152 


INTRODUCTION TO STATISTICAL ANALYSIS 


[12-4 


readings of various sizes occur is immaterial. This is clearly a multinomial 
distribution with a probability, from Eq. (12-5), 

AP = 0!l!(n -2)1110! p ° p * p *~ 2p * p ° = n(n ~~ (12-7) 

the probability is written as AP since it depends on the sizes of Ax and Ay. 

The limiting process mentioned earlier will involve an evaluation at 
small values of Ax and Ay. Consequently, we keep only those terms in the 
evaluation of p i, p 2 , etc., which contain Ax and Ay to their first powers as 
factors. Expressions like those for p 2 and p 4 were evaluated under these 
conditions in Section 8-1. The results for these two are 


Ax [ ( x ~ M) 2 ! riN 

p2 = ^ exp r~aH (12 - 8) 

and 

P4 = ^ exp [ - ^^]' (12 ~ 9) 

By using the definition of / in Eq. (12-6), we examine p 3 : 


V 3 


x-j-Ax/2 


L 

f m dt - / m dt - 

J X J X 


y -Ay/2 

m dt 

x+Ax/2 

m dt- I 

J X 



The first term will retain its value even when Ax and Ay become vanishingly 
small. Mindful of our intent to keep only the first powers of Ax and Ay 
and inspecting Eqs. (12-7) through (12-9), we see that for p 3 we need 
only keep the integral from x to y. The expression for AP then will involve 
the product AxAy. If we divide this expression by AxAy and make the 
latter vanishingly small, we obtain the following distribution function as 
the result: 


/(*, y) = 


n(w — 1) 
a n (2ir) n 



(X - M ) 2 + (v - m ) 2 

2a 2 


X 1 eXP (“ (± 2^-) di . 


(12-10) 


If we wished to know the probability of finding a smallest reading 
between x and x + Ax along with a greatest reading between y and 
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y + Ay, we would evaluate 



V+&V 

fix, y) dxdy. 


( 12 - 11 ) 


This is not what we are after, however. We wish to have a function with 
which to find the probability of r lying between R and R -f- AR, where 


r — y — x 

regardless of the individual values of x and y. For this purpose we must 
make a change of variable, 

y = x + r, (12-12) 

and then integrate over all possible values of x at a constant value of r. 
The reader will realize that r is bound to be nonnegative. 

Before rewriting the function with the new variable we note that there 
are other useful changes of variables; the reader should verify the results 
of all of these. After using Eq. (12-12), let 



With these substitutions, we obtain the distribution 

f(x, y) dxdy —> cr 2 f(u, w ) dudw, 
so that the distribution function becomes 


f(w, n) = 


n(n - - 1) 
( 27 t )»/ 2 



u 2 + (u + w) 


vr 


,-» 2 / 2 


) n — 2 


du. 

(12-13) 


Here n in f(w, n) is arbitrarily inserted for convenience; it is the total 
number of readings. Inspection of Eq. (12-13) shows that, remarkably 
enough, f(w, n) is independent of y and <x; it depends only on w and n. 

While Eq. (12-13) looks very complex, and indeed its numerical evalua- 
tion can be complicated for n > 2, we have seen that it is reasonably simple 
in concept. Furthermore, its basic meaning and its use are no different 
from those of the normal-distribution function given by Eq. (12-6). In 
particular, the expectation value for the ratio of the range to the standard 
deviation, deduced approximately in Section 10-11, is given exactly by 

roo 

y w (ri) = / wf(w, n) dw. 

Jo 
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The standard deviation of the ratio is found from 


<r'i(n) = [w — ii w {n)Yf(w, n) dw, 

Jo 

and the probability that the ratio lies within the interval from zero to W 


is 


r W 

Jo 


(P (W, n) — / f(w, n) dw. 


(12-14) 


For purposes of illustration, we shall evaluate these expressions for the 
simplest form of f(w, n), which is f(w, 2). To do this, we must first fin d 
f(w, 2). Equation (12-13) gives 


f{w 


. r 

7 r y_ ( 


du 


- <u +MW) du. 


One way of evaluating the integral is shown in Appendix 5, where it is 
found to have the value x/Jr e w2/4 . Thus 


J_ ( _v- 2 ' 

V 7 T 


f(w, 2) = e~" 


which is just twice the normal-distribution function with a = \/2. After 
noting this fact, however, we should recall that iv > 0 only. Thus f(w, 2) 
is properly normalized ; that is, 

rcc 

/ f(w, 2) dw = 1. 

Jo 

The expectation value for w when there are only two readings is then 


rcc 

Hw{2) = / we~ w 14 dw. 

X' 7T J 0 

Since w dw = %d(w 2 ), we can easily obtain /jl w ( 2 ) = 2 /Vf. The stand- 
ard deviation will be given by 

0 -^( 2 ) = [ w 2 e~ w 14 dw [ we~ w 14 dtt?') 

\/Jr J o x/ttVV^t^o / 


+ 


7T 


(7;/. 
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Integrals of the form in the first term above are discussed in Appendix 6. 
In the second term, the factor in parentheses is fx w ( 2), while the factor in 
parentheses in the last term is unity. The final result is 

M2) = ^2 - | • 

Table A on p. 230 describes the normal distribution by giving the 
areas under the distribution function for uniform increments of the 
argument of the function. Various measures of the distributions of finite- 
sized groups of readings, which we shall discuss in this chapter, including 
the range, are generally tabulated in an inverse manner. That is, values 
of a variable integration limit are usually tabulated for particular, often 
uniformly incremented, areas under the distribution functions. These 
areas might be computed by integration from the tabulated values to 
infinity, or from zero to the tabulated values, or otherwise in ways most 
appropriate to the quantity being tabulated. In principle, the problem is 
exactly the same as with that of the calculation of probable error, found 
in Section 9-2. 

For the present case, Eq. (12-14) indicates that the integration is to 
be from zero to some value W, where W is either tabulated or plotted. 
As an example, suppose that we wish to know the value of W such that 
there is a 90% chance that the ratio of the difference of a pair of readings 
to the standard deviation of the distribution from which the readings are 
drawn lies between zero and W. Then the equation to be solved is 

r W 

0.9 = / e~ w2/4 dw. (12-15) 

V 7T J 0 

It was mentioned earlier that this distribution function is of the normal 
form. In fact, if we set 

IV = y/2 1, 

we can rewrite Eq. (12-15) as 

rWH 2 

0.45 = — Lr / e~ (2/2 dt 

v27 T ^ 0 

and interpolate from Table A in the same way as we do in the rejection 
of readings. Thus 



so that 


W - 2.325. 
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It will be well now to review the meaning of these results. Suppose 
that we draw a very large number of pairs of readings (n = 2 in our 
example) from a normally distributed universe with a standard deviation 
<7. We would expect the average of the differences between the two readings 
constituting each pair to be 



We would also expect the separate values of w = r/o to be distributed 
about the number 2/y/r with a standard deviation of y/2 — 47r. Finally, 
we would expect 90% of the values of r to be less than or equal to 2.33o\ 
Perhaps it should be reemphasized that the a in these expressions is that 
of the normal distribution from which the readings are drawn. On the 
other hand, if one takes only two readings with a difference R, he would 
estimate a to be 

Rs/ 7 r 
a ~ 2 


and the standard deviation of the average of the two to be 


< T 0 = 


R\/ 7 r 
■ # 

2V2 


(12-16) 


Furthermore, if he knew a, or thought he did, from earlier observations, 
and if he took a pair of readings and found that their difference divided 
by o’ was 2.33 or greater, he might begin to wonder whether something 
might have gone wrong since there is only one chance in ten of this hap- 
pening. Most probably he would accept this result, but if the ratio were 
as large as 3.64, he would probably reject the readings or reexamine his 
value of a, since this value of a has only one chance in one hundred of 
being exceeded. 

From Eq. (12-16) we find that 

<Tq _ Vt _ 

R 2V2 

This kind of information is to be found in Fig. 10-2, though the figure 
does not give values of <Tq/R for n < 5 because the approximation used 
in its derivation gives poor results at low values of n. More complete 
information is given in Fig. 12-1, which includes results for the small 
values of n as well as bounds between which cfq/R can be expected to fall 
with the specified probabilities. The expectation value of cr 0 /R given here 
should be compared with that in Fig. 10-2. The precipitous loss of preci- 
sion with few readings is clearly shown by the limit curves. (Figure 12-1 
is plotted from Table V of Ref. 9.) 
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12-5 THE X-SQUARE DISTRIBUTION: TESTS OF FIT 

In Section 12-1, we promised to develop a procedure by which we can 
assess numerically, from the results of Eqs. (12-3) and (12-4) and a 
finite number of trials, the probability that the observational results are 
indeed governed by the proposed distribution. Although, in one sense or 
another, any of the distributions discussed in Sections 12-4 through 12-8 
will serve the same purpose, it was the X 2 (chi-square) distribution which 
we had in mind. 

We have repeatedly emphasized that it would take an infinite number 
of observations to demonstrate with absolute certainty that one’s observa- 
tions follow a particular distribution. Obviously one cannot make an 
infinite number of observations. On the other hand, one can easily propose 
some distribution that the observations appear to be following. He can 



158 


INTRODUCTION TO STATISTICAL ANALYSIS 


[ 12-5 


then calculate the expected results, and compare them with the actual 
results. If the proposed distribution is correctly chosen — and the choice 
becomes more restricted with increasing number of observations — he will 
find the results to agree closely, though not exactly, with each other. 
With the aid of the X 2 -distribution he can calculate the extent of chance 
disagreement that might be expected with a finite number of observations 
even though the proposed distribution is correct. The inference is that if 
the disagreement is greater than expected, then the proposed distribution 
must be wrong. It should be understood that “wrong” includes the case 
where the correct form might have been chosen, but with wrong parameters. 

Let us illustrate this last statement with a classic example of the applica- 
tion of the x 2 -distribution. Suppose that one makes several trials at 
rolling 10 biased dice, each of which has been loaded in the same way. 
The expected distribution is a binomial distribution of the kind illustrated 
in Fig. 6-2, but the numerical values will not be as given in that figure. 
The dice for the distributions of Fig. 6-2 were “good” in the sense that 
the a 'priori probability for the appearance of any given face in a single 
toss of one die was f . When the dice are loaded, this is no longer the case. 

In the following derivation, we will see that the number X 2 is basically 
the ratio of the sum of the squares of the errors to the square of the standard 
deviation. In our earlier discussions (see Sections 12-1, 12-2, and 12-4) 
we pointed out the distinction between the binomial or multinomial 
distributions resulting from N trials of an experiment and the distribution 
which describes the expected results of a single trial. Also, when we dis- 
cussed the range, we chose to use a normal distribution for the probabilities 
of single events; we shall continue to do so.* In this case, the “errors” in 
the definition of X 2 almost certainly will be residuals, since it is very 
unlikely that for those distributions assumed to be normal there is any 
other source of information about the expectation value than the arithmetic 
mean of the observations. The same situation characterizes the standard 
deviation, and we shall use these two facts as constraints, the effect of 
which we shall describe later. 

The X 2 -test is also applicable to other kinds of distributions. For a 
derivation not specifically tied to the normal distribution, the reader is 
referred to Fry [2]. We have already met several examples of such other 
distributions. In the case of a binomial distribution, when the problem 
is the tossing of coins or the rolling of dice, the errors in the definition of 
X 2 may be considered as the real errors to the extent that the dice or coins 
may be considered as good, and the value of the probability of success 


* The arguments presented in this and the following two sections are similar 
to those presented by Arley and Buch [8], pp. 93f. 
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in a single trial of one element is given by a priori considerations. Then 
there are fewer constraints than in the case of a normal distribution. 
However, the number of constraints will increase if we take the view that 
the elemental probabilities are biased by the manner of tossing, by loaded 
dice, filed coins, etc. To take this view means that some number or numbers 
previously assumed to be known must now be calculated from the obser- 
vations before we can proceed with the test. 

Suppose that observations are drawn from a normal distribution which 
has a mean p and a standard deviation cr. Only some finite number can be 
drawn; let this number be n and let the individual observations be desig- 
nated X{. The probability of drawing the tth value is, from Eq. (12-6), 

fM dX ‘ = ^ eXP [ _ (X, 2 ^ M> ] dX '- 


The total probability of getting the n independent observations, being the 
product of the probabilities of getting the individual ones, is 

d * = ^M~ U r]n dx ‘’ m 

where the sum £ and product II are over all the values of i from 1 
through n. 

The next step is to define n new variables to replace the x/s. So long 
as there are n new independent ones, defined in terms of the n xf s, the 
problem will be just as describable in terms of them as in terms of the 
x t -’s. First, let the ( n — l)th of the new variables w n — 1 » be the mean of 
the observations. That is 

1 n 

w n -i = x = - y] Xj. (12-18) 

n »=i 

As before, Vi will represent the zth residual; 

Vi = Xi — £. (12-19) 

We define the nth new variable, u n , as 



( 12 - 20 ) 


Finally, the first (n — 2) of the new variables will be defined as 


Ui = Vi/q, 


i = 1, . . . , n — 2. 


(12-21) 
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The sum of the squares of the real errors, which appears in Eq. (12-17), 
becomes, upon expansion, 

22%i — 2 tx^Xi + nix 2 . 

From Eq. (12-18) we obtain 

22%i = nx, 

and from Eqs. (12-19) and (12-20), 

q 2 = 22%i — 2 x'Exi + nx 2 — 22%i — ri% 2 . 

With this as a source for 22 x f> we find that the sum of the squares of the 
real errors becomes 

22 ( x i — fx) 2 = n(x — /x) 2 + q 2 . (12-22) 

It is now necessary to convert JI?=i dxi to some expression in terms 
of the new variables. The first step is to solve for the old variables in 
terms of the new ones. This is easy for the first n — 2 values of i. 
Recognizing that x is u n ~ i and q is u n , we can use Eqs. (12-19) and (12-21) 
to find 

Xi = u n —i + u n Ui, i = 1, — 2. (12-23) 

From Eq. (12-18) and by writing the last two items of the sum separately, 
we find that 

n— 2 

nu n -\ = x n + x»_ i + 2 Xi > (12-24) 

i= 1 

and from Eq. (12-23), 

71—2 71—2 

2 x i = (n — 2)w n _i + u n 2 U{. (12-25) 

i=I i= 1 

Henceforth we shall write 22i= i u i simply as 22u. Then, combining 
Eqs. (12-24) and (12-25), we obtain 

Xn “1“ x n — 1 === 2zf w — 1 Uni'll. 

Subsequently we will find it more convenient to write this as 

( Xn Un — l) ( Xn — 1 Un — l) = Un£,U. (12—26) 

By a similar operation with Eq. (12-20) we find that 

( X n ~ U n - 1) 2 + (Xn-l ~ U n - 1) 2 = U 2 ( 1 — & 2 ), (12-27) 
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where, as before, & 2 means L?=i 2 wf . If we transpose (x n — w n -i) to 
the right-hand side of Eq. (12-26) and then square the equation, we will 
obtain an expression for ( x n — w n -i) 2 for substitution into Eq. (12-27). 
Solving the resulting quadratic equation results in 

= w n _! - MI > + \/2 [1 - I> 2 - i(Ew) 2 ] 1/2 }, (12-28) 

and a back substitution leads to 

*n-i = Un- 1 - MZu - V2 [1 - Hu 2 - i(Zw) 2 ] I/2 }. (12-29) 

During the solution of the quadratic equation there will arise the question 
of the sign of the radical. The choice is immaterial since it does not matter 
which observation is called x n and which x n ~\. 

At this point the reader is referred to Appendix 7 on multiple integra- 
tion. According to the procedures described there, it is necessary to set 
up a determinant of partial derivatives of the old variables with respect 
to the new ones in order to make the transition from one integrand and 
one set of differentials to an equivalent integrand, but in terms of new 
variables, and new differentials. Thus, the necessary relation is now seen 
to be the following: 


n (<&.) 

t=l 


J(x x , 

• • • y %n ) H 1 y • • • y U'n) n (dw,') , 


i=l 


(12-30) 


where J is the determinant just mentioned. As examples of the differentia- 
tion, we can easily see from Eq. (12-23) that for x 3) say, all derivatives 
with respect to the u’s are zero except 


dx 3 dx 3 

du 3 dw n _ i 


dx 3 

du n 


= U 3 . 


By extension then, J is the determinant 


u n 

0 

• 

o 

o 

• 1 

W 1 

0 

U n 0 

1 

U 2 

0 

• 

0 u n 

1 

U 3 

• 

dXn-l 

dX n - 1 

1 

0 

dXn-i 

du i 

du 2 

1 

dUn 

dx n 

dx n 

1 

dx n 

dUi 

du 2 


du n 
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where 


dXn — 1 
dx n — l 

Siifi 

dXn 

dUi 

Sxji 

du n 


1 

2 


~u n \l + 


2(2 Ui + &) 


\/2[l - - i(Ew) 2 ]) 


- \ \Lu - V2[1 - E 


u z 


1 

2 

1 

2 


iQ» 2 ] 

2(2?^- + T/u) | 

V / 2[l - - i(E») 2 ] 


2> + \/2[i - Z« 2 - i(E«) 2 ] 


for i = 1, . . . , n — 2. 

It is a property of determinants that a factor that is common to every 
entry in a single column (or row) can be divided out as a multiplier of the 
determinant [12], Examination of the determinant J shows that u n is a 
factor common to the elements of every one of the n columns except 
the last two. Furthermore, this is the only way in which u n enters the 
determinant. Thus 

n n—2 

] [ dx{ = q n ~ 2 C(ui, . . . , u n _ 2 ) JI dui dxdq, 

i=l i—l 


where C represents what is left of the determinant after the division by 
u n ( —q ). Note that w n _i (=x) does not appear in J at all. Thus, at this 
point, the original probability as given in Eq. (12-17) has become 





2 fev$] exp (- S)^ 2d ^ c ff ' *“■ 

(12-31) 


The first thing we observe in Eq. (12-31) is that if we multiply by 
\/n/\/n, we can separate a set of factors 


1 

exp r ( * - ^ )2 i 

-(oyVrOV 2-7T- 

P L 2 (<r/v / n) 2 J 


dx 


from the others, where the latter become independent of x. Thus we see 
that the error in x, the mean of the observations, is a normally distributed 
variable with a standard deviation of cr/\/n, as given previously in 
Eq. (10-15). 

From the remaining part of d$> a second set of factors 




n—2 


(12-32) 
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can be divided out with no further reference to q in the remaining factor. 
The expression (12-32), involving the factor q n ~ 2 , is not a normal distri- 
bution. Furthermore, considering that various numerical factors have 
been left behind in C, the distribution of q/cr given by the expression 
(12-32) must also be presumed to be nonnormalized. That is, contrary 
to the result found in our discussion of the range, where the integral of 
f(w, 2) over all possible values of id was found to be unity, we must here 
make the integral over all values of q unity by the calculation of a 
normalizing factor. The value of d$ can be held unchanged by including 
the reciprocal of the normalizing factor in the part left behind. 

In the previous case the only possible values for the range variable w 
were found to be positive. Here also q 2 is necessarily positive since it is a 
sum of squares. For q itself then we find it convenient to use only the 
positive square root. Further, since q and a enter the expression (12-32) 
in a symmetric way, the normalizing factor must be a function only of n, 
which is the number of readings. 

Before evaluating the normalization factor then, we shall take advantage 
of this symmetry by making certain changes of variable similar to some 
of the changes made in the discussion of the range. It is the ratio q/cr 
that enters, just as it was the ratio of r to a that entered into the distribu- 
tion of the range. Thus the distribution of q/cr is independent of both the 
expectation value and the standard deviation of the distribution that 
determines the frequency of appearance of individual observations. 
Rather than discuss the distribution of q/cr, however, we shall discuss the 
distribution of ( q/cr ) 2 ; this will allow us to avoid the necessity of taking 
square roots. Thus we shall use 



as an integration variable and reserve the designation X 2 for observed 
values of this quantity; X 2 will be an integration limit. 

We recall that a number x was determined from the observations so 
that there is a constraint on the value of q. This means that q 2 is a sum 
of squares of numbers such that the sum of the numbers themselves is 
zero. At most, therefore, only n — 1 of the residuals can be called free 
variables. The number of “degrees of freedom” will be designated by /. 
Thus from the expression (12-32) we find that the distribution for which 
we must determine the normalization factor P(f) is 

f) d(w 2 ) = P(f) («? 2 ) (/—2>/2 e — 11,2/2 d(w 2 ), (12-33) 

where the factor of ^ in 

dw = 1/2 w d(w 2 ) 

is included in P(f). 



164 


INTRODUCTION TO STATISTICAL ANALYSIS 


[ 12-5 


The normalization factor is determined from 

r<x> r oc 

/ <p(w*,j)d(w 2 ) = 1 = P(J) / ( W 2 ) l/ - 2> ' 2 e-"' ! ' 2 d(w 2 ). 

JO J 0 

Here the reader is referred to Appendix 6 for a discussion of the evaluation 
of definite integrals of this form. We see from this Appendix that 

= [r(//2) X 2//2J ’ (12-34) 

where T(//2) is read as “the gamma function of (f/2 ).” The reader is not 
required to know any more about this function than is described in 
Appendix 6; it is a convenient designation to use here because of the way 
in which the value of P(f) changes depending on whether f is an even or 
an odd integer. 

It was mentioned in Section 12-4 that various alternative but equiva- 
lent ranges of integration are used to describe the areas under distribution 
functions of the kind we are concerned with here. For the present case, 
Table B on p. 232 describes the distribution of X 2 by giving values of X 2 
calculated from 

/» 00 

= r(//2)X 2/» (w 2 ) (/_2l,2 «““’ J/2 <*(w 2 ) (12-35) 

for particular specified values of (P(x 2 ,/). 

The meaning of Eq. (12-35) is the following: For a given value of /, 
there is a probability <P(x 2 , /) that, purely by chance, an observed value 
of X 2 will be greater than or equal to the tabulated value when the 
observed value is calculated from the correct distribution. The practical 
use of Eq. (12-35) via Table B, which is constructed from it, can be 
described best with the aid of examples to be given later. Before proceeding 
to these examples, however, it will be well to illustrate the construction 
of Table B. We will also need to do a little more mathematical work to 
derive a property called the additivity of X 2 . 

Suppose that we wish to determine from five observations the value of 
X 2 which has a 90% chance of being exceeded. In this case, / = 4, which 
is an even number, so that from Appendix 6 we find that 

r(2) = 1. 

Thus we must evaluate x 2 in 



This integral can be easily evaluated to yield the result X 2 = 1.064. 
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Suppose, on the other hand, that we wanted the value of X 2 which has 
only a 10% chance of being exceeded. Then we would expect it to be 
considerably larger than 1.064. From 



we obtain X 2 = 7.779. 

Finally, suppose that there were six observations and we wanted the 
value of x 2 which has a 10% chance of being exceeded. Here / = 5, an 
odd integer. The evaluation of X 2 in this case is somewhat trickier and will 
be shown in greater detail. 

From Appendix 6 we find that 

F(f) = f X|X Vtt, 

so that 

r x 

(P(x 2 , 5) = 0.1 = (3V27T) -1 / (it> 2 ) 3/2 e“ 1i,2/2 d(w 2 ). 

Jx 2 

Since it is convenient here to use w rather than w 2 as the integration 
variable, the equation can be transformed into 

0.1 = -4= f w\- w2 ' 2 dw. 

We must make two successive integrations by parts. First, we let 

u = w s , dv = we~ w2/2 dw 


to obtain for the integral 


-w s e ~ w212 


2 —w 2 l 2 

we 1 


dw. 


+ 3/ 

Another substitution with u = w yields 

— (x 3 e~ x212 + 3xe-* 2/2 + 3 [ e~ 
3\/27r \ Jx 


0.1 = 


w 2 1 2 


dw ^ 


Since we are merely verifying the entries in the table, we shall substitute 
the value X 2 = 9.236 from the table into this equation to see whether the 
right-hand side has the value 0.1. The first two terms are straightforward. 
The last term is evaluated as in the example for the range. That is, using 
X = \/9.236 = 3.04 and obtaining from Table A 

•3.04 


Uf 

2-7T JO 


,-w 2 / 2 


V2 7T 


dw = 0.4988, 
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we find that 


r 

J 3.< 


— w 2 /2 


3.04 


dw = V2tt(0.5 - 0.4988), 


which verifies the value 0.1 for (P(9.236,5). 

Let us now proceed to discuss the additivity of X 2 . If one had two 
independent sets of observations drawn from the same distribution, he 
could calculate X 2 for each. The distribution of x 2 for each would be given 
by Eq. (12-35), which is derived from Eq. (12-33); into this equation 
would enter in turn the number of degrees of freedom for each set. We 
wish to show that the distribution of the sum of the separate X 2 is a x 2 - 
distribution for a number of degrees of freedom equal to the sum of the 
degrees of freedom in each set. If this can be done, the results can be 
extended without further proof to any number of sets by taking one set 
of a pair to be a previous combination. 

It should be noted that it is the combination of values of x 2 which is 
being discussed and not the combination of sets of readings. With two 
sets of readings, for example, the combination of X 2 and the degrees of 
freedom discussed above imply two constraints or restrictions corre- 
sponding to the calculation of a mean for each set. If the two sets were 
combined before the calculation of X 2 and the number of degrees of 
freedom, the result would imply that there was only one constraint, corre- 
sponding to the calculation of a single mean for the combined single set 
of readings. 

Let the two sets of observations be designated X\, x 2 , . . . , x nv and 
yi>y 2 > • • • y yn 2 . Since every observation in each set is independent of 
the others, the joint probability for the existence of the two sets will 
involve the total sum of squares of the differences between the readings 
and the mean n of the parent distribution [see Eq. (12-17)]. It is con- 
venient for the present purpose to break this sum up into two parts. The 
joint probability is 


d<$> = 



Tii+n 2 


exp 


1 ' 
2 a 2 _ 


£(*i — m ) 2 + £(*/i — m) 


! ]) n 


dx t Y[ dy 




(12-36) 


where i = 1 , ,n x for Xi and i = 1, . . . , n 2 for y*. The various factors 

in this equation can be treated just as before until one arrives at a com- 
bined distribution corresponding to Eq. (12-33), which now has the form 


<p(wl, wl, fxy fy) d(Wx) d(wl) 

= P(f x )P(fy)(wx) {fx ~ 2)l2 (wl) {fy ~ 2)l2 e— < ' w * +w « )l2 d(w 2 ) d(w 2 y). (12-37) 

We again go through the now familiar process of changing integration 
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variables. This time, let 



W x 

= w cos d, 

so that 




W 2 

— w 2 cos 2 d, 

and 




w 2 

— w 2 -f- w 2 . 


Recalling that the integration variables were w 2 and w 2 rather than w x and 
w y , and maintaining w 2 as the new integration variable, we write 

d(w 2 ) d(wl ) = J(w x , w 2 ', w 2 , d) d{w 2 ) dd 
corresponding to Eq. (12-30), and find with the aid of Appendix 7 that 


J(w 2 , w 2 ; w 2 , d) — 2 w 2 sin d cos d. 

Then, referring back to Eq. (12-37), we obtain 

w 2 {w 2 xf fx ~ 2)l2 {wl) Uv ~ 2)l2 = (w 2 Y f ~ 2)l2 (cos 0) fx ~ 2 (sin d) fy ~ 2 , 
where 

/ = /* + fy 

The distribution now has the form 


<p(w 2 , 0, fx, fy, f) d(w 2 ) dd 

= 2 P(fx)P(fy)(w 2 f- 2)l2 e- w212 (sin d) fy ~ x (cos 6 ) /l ~ 1 d(w 2 ) dd. 

Since we are only interested in the distribution of w 2 , regardless of 
whether it is made up of a large w 2 and small w 2 , or vice versa, we integrate 
over those values of d which cover all these contingencies, which is from 
0 to 7t/2. Thus we must evaluate 

f 12 (cos d ) fx ~ l (sin d) fv ~ l dd. 

Jo 

It is shown in Appendix 8 that, regardless of the oddness or evenness of 
f x and f y , the value of this integral can be written as 

/• x / 2 

/ (cos d) fx ~ x (sin d) fy ~ l dd = ^ r ^*^(V 2 ) . (12-38) 

Jo Z l (J/ Z) 

When this result is combined with 2P(f x )P(f y ), the distribution of 
( w 2 ), that is, <p(w 2 , f) d(w 2 ), takes the form of Eq. (12-33) with P(f) 
given by Eq. (12-34) except that now /is the sum of the degrees of freedom 
of the two separate sets we started with. It should be reemphasized that 
this result applies for sets which are independent of each other. 
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Table 19-2 


Experiment 

Reading 

Analysis 

1 

—4.5 

x = —0.9 

2 

3 

2.5 

—1.5 

2> 2 = 96.5 

4 

-5.5 

( X » 2 /10 = 8.1 

5 

-0.5 

6 

—1.5 

.-. 2> 2 = 88.4 

7 

1.5 

X 2 = 7.87 

8 

— 3.5 

9 

4.5 

0.5 < P(X 2 > 7.87) < 0.7 

10 

-0.5 

with / = 9 


We are now ready to consider some examples of the application of x 2 
testing. The first four examples will be taken from the dart-dropping 
experiment described in Chapter 7. The first three will be somewhat arti- 
ficial in that we will use arbitrary sets of data. The purpose is to illustrate 
the arithmetic and the way in which Table B is used. For these three 
illustrations we will assume that the parameters of the distribution, 
particularly the standard deviation, are known from a 'priori knowledge 
to be those in the last row of Table 10-5. In particular, the value of a 
used in the first three examples are not calculated from the data of those 
examples. 

The first example is shown in Table 12-2. Some discussion of the last 
column is necessary. First, the reader should show that J^wv 2 can be 
computed directly from 


Y^WV 2 = ^2wx 2 




[see Problem 10-1]. Second, as in the derivation, X 2 is just the sum of the 
squares of the residuals divided by the square of the standard deviation 
for the distribution. Third, since we are assuming <r to be given, as in fact 
it is, so far as this set of data is concerned, the only restriction involved is 
in the calculation of x, that is, = 0. The number of degrees of freedom 
is one less than the number of readings. Fourth, when we refer to Table B 
for / = 9, we find that a X 2 as large as this could be expected to occur 
by chance more than half the time even when calculated on the basis of 
the actual distribution. It is interesting then to compute a from these 
observations. The result is 3.13 as compared with the “true” a of 3.352. 

Suppose now the data in Table 12-3 have been obtained. The analysis 
shows that only one time in 50 could one expect a X 2 as large as this to 
occur by chance when the correct distribution was used in its computation. 
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Table 19-6 


Experiment 

Reading 

Analysis 

1 

-9.5 

x = —0.9 

2 

3 

6.5 

0.5 

2> 2 = 228.5 

4 

0.5 

(2» 2 /10 = 8.1 

5 

-6.5 

6 

-4.5 

,\2> 2 = 220.4 

7 

2.5 

X 2 = 19.62 

8 

—0.5 


3 

4.5 

P(X 2 > 19.62) ~ 0.02 

10 

-2.5 

with/ = 9 


This figure is in the range, 1 to 5 times in 100, where most statisticians would 
conclude that it is safe to say that the data do not fit the assumed distri- 
bution. The standard deviation computed from this set is 4.95. 

Next, let us consider the data in Table 12-4. Here the converse is true. 
It is very unlikely that data drawn from a distribution having <r = 3.352 
would have as small a value of x 2 as is shown by the data of Table 12-4. 
The standard deviation calculated from these data is only 0.21. 

Finally, of course, the question arises as to how well the 498 points now 
in the dart-dropping data fit the normal distribution according to a x 2 
test. The analysis shown in Table 12-5 demands the simultaneous applica- 
tion of many of the ideas which have previously been considered more or 
less separately. 

The first column of Table 12-5 gives the various intervals into which 
the target was divided. The second gives the expected number of drops 
in that interval according to a normal distribution with n = —0.7229 
and <t = 3.352. The third column shows the observed number of drops 


Table 19-4 


Experiment 

Reading 

Analysis 

1 

—1.5 

x = —0.9 

2 

—1.5 

Ex 2 = 12.5 

3 

4 

+0.5 

—0.5 

(2» 2 /10 = 8.1 

5 

-1.5 

.'■E v2 — 4.4 

6 

—0.5 

X 2 = 0.39 

7 

8 

—1.5 

—0.5 

P(X 2 > 0.39) » 0.99 

9 

—0.5 

P(X 2 < 0.39) « 0.01 

10 

— 1.5 

with/ = 9 
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Table 12—5 


Xi 

Expected 

Observed 

Pi 

of 

X? 

—10.5 

0.85 

2 

hrse 3 



- 9.5 

1.99 

2 

HKmf n 

6.826 

0.001 

- 8.5 


3 


1 


- 7.5 

7.72 

12 


7.599 

2.410 

- 6.5 

13.55 



13.177 


- 5.5 

21.56 

15 




- 4.5 

31.37 

35 

.0630 

29.397 

0.448 

- 3.5 

42.08 

45 

.0845 

38.525 

0.221 

- 2.5 

51.34 

49 

.1031 

46.050 

0.119 

- 1.5 

57.42 

54 

.1153 

50.799 

0.230 

- 0.5 

59.06 

70 

.1186 

52.058 

2.299 

0.5 

55.23 

48 

.1109 

49.103 

1.064 

1.5 

47.41 

55 

.0952 

42.896 

1.343 

2.5 

37.40 

32 

.0751 

34.591 

0.843 

3.5 

26.84 

32 

.0539 

25.396 

1.048 

4.5 

17.68 

15 

.0355 

17.052 

0.421 

5.5 

10.71 

8 

.0215 

10.477 

0.701 

6.5 

5.88 

4 

.0118 

5.807 

0.608 

7.5 

2.94 

2 

.0059) 



8.5 

1.39 

2 

.0028 

4.881 

0.878 

9.5 

0.60 

3 

.001 2, 




2> 2 

= 15.676, 

f = 17 - 

- 3 = 14 



0.3 < P(X 2 > 

15.676) 

< 0.5 



in that interval, while the fourth shows the elementary probability of the 
single event, a landing of the dart in that interval. 

There are 21 lines in the table. We can regard the experiment as 498 
trials at the rolling of a “die” having 21 sides such that the probability 
of a particular one of the sides turning up on a single trial is the figure 
given in the fourth column. We saw in Section 12-1 how to cope with this 
situation. For each side i, the expectation value is 498 P t , and the square 
of the standard deviation in this number is 498P t (l — Pi). The third 
column then shows the single observation made on the expectation value 
498 Pi , and the fifth column shows the square of the appropriate standard 
deviation. The last column shows X? for each of these observations; it is 
the ratio of the square of the single error at each point divided by the 
square of the standard deviation. Each such value of x 2 , considered by 
itself, is a single observation with no constraint and has one degree of 
freedom. 

The reader will note that at each end of the distribution, three intervals 
have been treated as one observation. Fry [2] has shown that the accuracy 
of the computed value of X 2 drops when the number of events which 
constitute an observation is too small; 5 is a convenient number to use 
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as the limit. Thus in our case, one might say that we have converted the 
21-sided “die” to one with 17 sides. 

Reference to Table B for one degree of freedom shows that the probability 
of occurrence of values of X 2 of the magnitude in Table 12-5, or greater, 
ranges between about 10 and 98%. 

A better test involves the additivity property. The sum of all these 
values of X 2 is 15.676. However, we must recognize that the number of 
degrees of freedom for the sum of all of them is not quite 17. The additivity 
property applies to independent observations; when we add them all, the 
observations are not all independent. There are three constraints: they 
come from the use of the data to compute the mean, the use of the data 
to compute the standard deviation, and the fact that the number of events 
must add up to 498. Then the number of degrees of freedom for £x 2 is 
14. Reference to Table B for / = 14 shows that a value of X 2 > 15.676 
has a chance of occurrence of somewhere between 30 and 50 times out of 
100. We conclude that the normal distribution with p = —0.7229 and 
<t = 3.352 is a satisfactory description of these data. 

The last example to be shown is a coin-toss experiment. We will see 
that the analysis is much like that in the previous example. However, the 
distribution is binomial, and the probability of occurrence of an individual 
event is assumed to be known from a priori considerations. Thus when 
all the separate values of X 2 are summed, there is only one constraint. 
This is another way of saying that the numbers of tosses which result in 
a particular event must add up to the total number of tosses. 

The example to be used is the repeated tossing of four coins simultane- 
ously and the observation, after each toss, of the number of heads which 
appear. With four coins there are five possible events; that is, anywhere 
from zero to four heads can appear. The probability of each of these events 


Table 12-6* 


Number of 
heads, i 

Vi 

50p t 

obs. 

X 2 

640p t 

obs. 

X 2 

0 

1/16 

3.125 

4 

0.26 

40 

38 

0.11 

1 

1/4 

12.5 

11 

.24 


152 

.53 

2 

3/8 

18.75 

21 

.43 


245 

.17 

3 

1/4 

12.5 

12 

.03 

160 

165 

.21 

4 

1/16 

3.125 

2 

.43 

40 

40 

0 





1.39 



1.02 

/ = 

4 

P(x 2 

> 1.39) ~ 0.85 

P(X 2 > 1.02) ~ 0.91 


* These data were taken years ago when one of the authors (G.H.W.) was a 
student of the other. The fifty tosses were made by G.H.W. ; the 640 tosses are 
the sum of those made by the whole class. 
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can be computed according to the methods of Section 6-6. Computation 
of the expected number of times each event will occur, and its standard 
deviation, is as described in Section 12-1. Table 12-6, basically the same 
as 12-5, though not as detailed, shows the results of the experiment and 
the subsequent analysis. We conclude that the results of the experiment 
are not inconsistent with the assumption that the coins are good and that 
they were tossed randomly. 


12-6 STUDENT* S t DISTRIBUTION: COMPARISON OF 
AVERAGES 

In order to give a sufficiently useful description of x 2 -testing, we had to 
give some consideration to the treatment of independent sets of observa- 
tions drawn from the same parent distribution. We shall continue this 
subject in this section with the object of finding a method of numerical 
assessment of how much the averages of two such sets can be expected to 
differ from each other by chance. As before, we propose that when the 
difference is much greater than expected, we judge that the two sets were 
not in fact drawn from the same distribution. For the variable t we assume 
that the distributions are of the same form and with the same values of 
<t for the two sets of data; we wish to see whether the distributions yield 
the same expectation values. 

The joint probability with which we must work is given by Eq. (12-36), 
but we shall proceed in a somewhat different direction from the discussion 
in the previous section. Two alternative expressions can be written for 
the sum of the squares of the real errors which appears in Eq. (12-36). 
We can proceed in the same manner as in the derivation of Eq. (12-22) 
and obtain 

E (error) 2 = ql + ql + ni(Z — n) 2 + n 2 (y — n) 2 , (12-39) 

where 

qi = Z(*< - *) 2 , ql = Z(Vi - V?- 

On the other hand, the sums can be lumped together before expansion 
so that they become 

E (error) 2 = (E&? + E*/f) — 2/u (E«< + HVi) + Ok + n 2 )*i 2 . (12-40) 
It is clear that 

E^i + Ey» = n i% + n 2& 

and that the mean of all the readings from both sets, designated as m, is 



12 - 6 ] 


173 


STUDENT’S t DISTRIBUTION 


Define Q in a similar fashion to the q of Eq. (12-20), so that 

Q 2 = E(*< - m) 2 + E(Vi - m 2 - 


Expansion of this and the use of Eq. (12-41) lead to 

(n,X + « 2 5) 2 


Q 2 = (Eif + Evf) - 


Til -(- 712 


Thus, by going back to Eq. (12-40), we find 


S (error) 2 = Q 2 -f- — - — -y- — — 2/*(ni:r + n 2 ?/) H - (wi H - w 2 )^ 2 . 

7X\ + 712 

(12-42) 

But Eqs. (12-39) and (12-42) are expressions for the same quantity. 
Therefore, 

Q 2 = <& + 9 1 + (* - 5) 2 , (12-43) 

where we see that the previously mentioned quantity of special interest, 
(x — $), has turned up. Equation (12-43) is now to be used to replace 
Q 2 in Eq. (12-42). At the same time, we note that when (rti + n 2 ) — 1 is 
factored out of the last three terms in Eq. (12-42), the terms constitute a 
perfect square, the expression for which is simplified by the use of Eq. 
(12-41) . The result is 

X) (error) 2 = ql + ql + n ± 2 (x — £) 2 + (n x + n 2 )(m — / 1 ) 2 . 

(12-44) 

Thus the argument of the exponential in Eq. (12-36) becomes 

argexp = - ql ± - jg ~ £ - = £ . 

2c 2 2[oVl/fti + l/n 2 ] 2 2[<r/Vn 1 + n 2 ] 2 

(12-45) 

When we use Eq. (12-39) as the sum of the squares of the errors, we are 
led to proceed with JJ dxi and U dyi exactly as for the derivation of the 
x 2 -distribution. The individual factors of these two products are in- 
dependent of each other, and the results of this procedure are just as 
before, except that there are two sets. This conclusion is not altered 
in any way by rewriting the sums of the squares of the errors, which led 
to Eq. (12-45). When d<i>' is used then to designate the factors of interest 
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which are to be saved in Eq. (12-36), d<t>' becomes 
Vn[ n~2 (qx/<J‘) fx ~ l {g v /(T) fu ~ l 


d<t>' = 


2t<t 2 


r (f ) 2 


X exp -s 


qx ~t~ gy 
2<r 2 


)(/ x 2 )/ 2 ji ^ — 2)/2 

(5 ~ ^) 2 

2 U./X 7 -^- 


(jn—jxy 


(<r J— + — Y 2 ( . * ) 

\ \n i W2/ \\/tti + n 2 / 


X d 


(?)'(?) 


dx d]j. 


+ n.2> 
(12-46) 


Further changes of variables must be made now in order to integrate 
this expression partially. That is, we wish to know the distribution of 
(x — ?/) regardless of the values of m or of q x and q y . This distribution 
can be found by integrating over all possible values of w, q x , and q y to see 
what is left for the difference (£ — ?/). It is to be remembered that, by the 
nature of their definitions, q x and q y are positive numbers. 

In addition to m defined in Eq. (12-41), we let three new variables be 

? = Vq x + q 2 y , 6 = tan -1 Sm , t = — & = T » (12-47) 

gVl/ni + 1/^2 

where 

/ = /* + fy — Wi + n 2 — 2. (12-48) 

The old variables, expressed in terms of the new ones, are 


qx 
x - 


■ q cos d, 

m + 


q y = q sin 6 
n 2 qt 


V = m — 


/\/nin 2 (ni + n 2 ) 

niqt 

/v / win 2 (n 1 + n 2 ) 


With the aid of the previous discussion and Appendix 7, we can show that 
the required Jacobian J (q x , q y , x, Tj; q, 9, t, fn) is 

J = J— + — » 

v7 \«i n 2 

so that 


dq x dq y dx dy —> yj- — \- —dqdd dt dm. 

* J 
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It is immediately apparent that the set of factors 


1 


(<r/V«i + n 2 )v / 27T 


exp 


(m - n)‘ 


2(<r/Vn 1 + n 2 y 


dm 


can be divided out with no further reference to m being left behind. That 
is, as expected, the error in the grand average of all the readings from the 
two sets is normally distributed with the weight (n i + n 2 ) . The integration 
of its distribution function over all values of m will result in unity. 

The remaining part d$" is written as 


d<S>" = 


^ q cos 1 sin 6 ^ fv 1 


rs/Yr 


(f) 2</ ‘“ 2),2r (f ) 

X exp |^— ^1 + j'j j dqdddt. 


j v i 2(/v— 2)/2 


v? 


(12-49) 


Our original intention, which we had stated before we changed the 
variables, was to integrate over all values of m , q x , and q y . The first was 
done. The integration over the latter two can be accomplished with the 
new variables by integrating over q from 0 to m and over 0 from 0 to 
7t/2. Consider the first of these. The pertinent integral, taken from 
Eq. (12-49), is 


fo « /ex p [-^( 1 + 7 )] rf «- 


Appendix 6 shows that the value of this integral is 


so that 


2\-(/+l)/2 


2 (/-d/ v-Hr (^)(i + j) 

2T ) (cos eY’~' (sin 0) /,_1 dedt 


d<h" = 




1+ 7/ 


^2\(/+D/2 


(12-50) 


The remaining integration over 0 was described in Section 12-5. When 
this has been carried out, the distribution of t is found to be 


<p(t, f ) dt = 




dt 


^ r (iX 1 + 7) 


^2\(/+D/2 


(12-51) 
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This is the distribution discussed by W. S. Gossett* in 1908 with the 
pen name “Student” and referred to ever afterward as “Student’s t 
distribution. ” 

The quantity t is defined in Eq. (12-47). It is seen to be proportional 
to the difference between the means of two separate sets of observations 
divided by a quantity q/y/J. The latter is a sort of average of the estimates 
(for the separate sets) of the standard deviation of the parent distribution. 
That is, 

q _ - 5Q 2 + Zfc - yf 

\Tf L ( n i ~ 1) + (^2 — l) 


1/2 

(12-52) 


The distribution of t, however, depends only on the total number of 
degrees of freedom /. 

Tests based on the /-distribution are similar to x 2 -tests in that one is 
interested in the integral of the distribution function from some value to 
oo. When such tests are made, one should realize that, unlike the X 2 - 
distribution, the /-distribution (Eq. 12-51) is symmetric about the origin. 
Thus, to verify the normalization, one must integrate from — oo to +oo, 
or, equivalently, multiply the integral from 0 to oo by two. It is suggested 
that the reader verify this, as well as one of the entries in Table C on p. 234. 
The procedures have been repeated sufficiently often that no further illus- 
tration is deemed necessary, although a couple of hints might be helpful. 
The substitution 

t 2 = / tan 2 6 


is useful, and 



(cos 8) f 1 dd 


can be found from Eq. (12-38) by setting f x = f and f y = 1. The/ which 
appears on the right-hand side of Eq. (12-38) then has the value / + 1. 

We note that the tabulated values of / for particular values of the 
probability satisfy the equation 


<P it,f) = 


[ <p(t> f) dt + 

J — oo 



<p(t, /) dt 


/ oo 

dt. 


Thus the table lists critical values of |2|, which is proper since it is just as 
likely that the difference between the averages of two samples will be 
positive as negative. 

A classic use of the /-test is in chemical analysis. For instance, an 
analysis is run for a particular element by two different methods, or two 


* W. S. Gossett, Biometrika, VI, 1 (1908). 
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Table 12-7 


Sample 1 


Sample 2 


0.26 

.27 

.28 

.24 

.25 

.29 

.27 

.27 

.28 

.28 


0.29 

.30 

.31 

.30 

.29 


Av. 0.298 
h = 4 



Analysis 


0.002090 + 0.0002801 1/2 
9 + 4 


0.0135 

0.298 — 0.269 
0.0135 bnr + il 1/2 
3.92 
9+4 

Pit > 3.92) < 0.01 


Av. 0.269 
fi = 9 


Conclusion : the samples are different. 


methods of preparation are used to produce what is, hopefully, the same 
compound or alloy. In the latter example, the same method of analysis 
would be used on the two products. In either case, one expects to get 
different answers in the two analyses; the question is whether the dif- 
ference is within the bounds of what one might get because of the incidence 
of chance errors even if, in the first example, the methods had no relative 
systematic error or, in the second example, the products were identical. 
The example shown in Table 12-7 should serve to illustrate the use of the 
/-test for this purpose. 


12-7 THE F-DISTRIBUTION: ANALYSIS OP VARIANCE 

Equation (12-37) gives the joint distribution of the ratios of sums of 
squares of residuals for samples of two different sizes to the square of the 
standard deviation for the distribution from which the samples were 
drawn. Immediately following that equation we used it to show the 
additive property of X 2 . In Section 12-6 we used the immediate antecedent 
of Eq. (12-37) to derive a distribution by means of which we could discuss 
the significance of an observed difference between the averages of the 
results for the two samples. In this section we wish to use Eq. (12-37) to 
derive another distribution, the F-ratio distribution, which is applicable 
to this case of two samples supposedly drawn from the same parent 
distribution. Whereas the /-test tests for differences in averages, the la- 
test tests for differences, via values of their ratios, in variances; variance 
is defined as the square of the standard deviation. 

This new distribution can be derived quickly, since so much of the work 
has been done in previous sections. Given Eq. (12-37), we define a new 
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variable as follows : 



(12-53) 


We wish to know the distribution of g regardless of the values of Wx and 
Wy. Hence we substitute in Eq. (12-37) 


wl 


Jy 


d(w 2 x ) = jwl dg. 
Jy 


and then integrate over all values of Wy- The reader should verify that the 
result is 



where, as usual, 


/ = fx + fy 


(12-54) 


We recall that the w 2 and Wy of Eq. (12-53) are each ratios of the sums 
of squares of the residuals for the corresponding samples to the variance, 
that is, <r 2 of the single parent distribution function from which each 
sample was drawn. Thus Eq. (12-53) could have been written 



which is just the ratio of the estimate of a 2 as given by the x sample to 
the estimate of a 2 given by the y sample. Thus, when we write 

V(F,fx,fy) = ( (d, fx, fy) dg, (12-55) 

Jf 

(P (F, f x , f y ) represents the probability that the ratio of these two estimates 
will be greater than or equal to F, purely by chance, when the variances 
for the two samples are in fact the same. As with X 2 , the range, and t, 
values of F are tabulated for particular values of (P (F,f x ,f y ). The latter 
are generally 0.05 and 0.01. However, since this distribution is asymmetric, 
which is evident from the way in which f x and f y appear in Eq. (12-54), 
the F tables must be much more extensive than those we have met so far. 

Table D on p. 236 shows that all the tabulated values of F are greater 
than or equal to unity. The designation “Degrees of freedom for greater 
mean square ” refers to the degrees of freedom used to calculate the larger 
of the two variances; in order to use the table directly, we insert the value 
of this variance into the numerator. However, just as either of two 
averages could be used first when one is calculating t, so could either of 
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two variances be used in the numerator to calculate F for the simpler type 
of application which we will consider first. The difficulty in the f-distribu- 
tion was easily surmounted because of the symmetry of the distribution. 
Now, with the ^-distribution, we are faced with the same lack of simplicity 
in the assessment of probability information that we met in our discussion 
in Section 9-4 of the standard deviation for an assymmetric distribution. 

Consider the table entry at the 5% level for both f x and f y equal to four, 
for example. We see that, working at this level, an F-ratio of 10 would 
indicate inequality of the variances. A ratio of 0.1, obtained by inverting 
the ratio, must therefore also indicate inequality of the variances. In 
particular, then, a ratio between 6.39 and 1/6.39 would have a 90% — 
not 95% — probability of occurring by chance when in fact the variances 
are equal. 

We are thus led to examine the effect on Eqs. (12-54) and (12-55) of 
making the substitution 

u = 1 /g. 

It is left as an exercise for the reader to show that 

[ <p(g,fz,fv) dg = f llF <p(u,fy,f z ) du, (12-56) 

Jf Jo 

which demonstrates the obvious fact that a ratio of variances has the same 
distribution regardless of which one is in the numerator. Equation (12-56) 
also shows how the tables can be extended to ratios less than unity. 

As an example, consider the data used to illustrate the /-test in Table 
12-7. Here, q\/f x = 0.000232 and q%/f 2 = 0.00007, with f x = 9 and 
f 2 = 4. With the greater mean square in the numerator, F = 3.32. The 
critical value, as seen from Table D, at the 5% level is 6.00. We need to 
know now the critical value of F < 1, still with 9 degrees of freedom for 
the numerator mean square and 4 degrees of freedom for the denominator 
mean square. We see from Eq. (12-56) that this is the reciprocal of the 
value of F obtained from Table D such that 

0.05 = f <p(g, 4, 9) dg; 

Jf 

F is found to be 3.63. Thus there is a 90% chance that the observed value 
of F will lie between 0.28 and 6.00, even though the true variances are 
equal for the two samples. The value 3.32 is therefore not inconsistent 
with the equality of the variances. It is interesting, however, to note that 
these data did indicate a real difference in the means of the two samples. 

It has been suggested [13] for cases like the above, where there is no 
criterion on the basis of which one could decide to place one or the other 
of the two variances in the numerator, that the larger variance be placed 
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in the numerator, and the critical value of F, read at the 5% level, say, 
be interpreted as that which will be exceeded 10% of the time, by chance, 
when the variances are equal. Considering the large differences in critical 
values of F at the 1 and 5% levels for small samples, this is probably 
satisfactory for most work. We note from Table D, however, that this 
approximation approaches exactitude only when both samples are large. 
Thus, in the example used above, we worked with critical values of 6.00 
and 3.63, which differ by almost 50%. At values of f equal to 40 and 50, 
say, these two numbers become 1.63 and 1.66. 

Possibly the most extensive use of the F-distribution is in conjunction 
with the procedure called “analysis of variance.” This procedure can be 
described as a partition or separation of the total variance of a particularly 
organized set of observations into parts assignable to different sources. 
One of these sources will always be the random error assumed to be 
common to all the observations. The F-ratio distribution is then used to 
examine the ratio of the variances from the other sources to the error 
variance in order to assess the probability that other differences occur 
between the readings than those which could be assigned to the random 
error distribution. Thus there will be no problem here as to which variance 
is to be put into the denominator. 

In our discussion of the ^-distribution and its use, we suggested as an 
example that two different people performing the same experiment with 
the same equipment might well get different results. The tests proposed 
there were designed to make a critical examination of such a suggestion. 
Thus we make the hypothesis that the results are not “really” different — 
for instance, it is not really true that one of the observers is nearsighted 
and the other farsighted — but the results only appear to be different 
because of the chance distribution of the errors in the observations, all 
of which, according to the hypothesis, were drawn from the same popula- 
tion. The probability, under this assumption, of the event (namely, the 
size of the difference between the two average results) that did occur is 
then examined. If it is large, we conclude that the hypothesis is verified. 
If the probability of the actual event, under the hypothesis of no difference, 
turns out to be very small, we usually conclude that the hypothesis is 
erroneous. 

The principal use to which the F-distribution is put is, in a manner of 
speaking, an extension of this idea. To continue with an expanded version 
of the above example, one might imagine two, or three, or more people 
making observations on the elastic limit of a particular steel after it has 
been annealed and quenched from each of several successively higher 
temperatures. In addition to the questions raised earlier relative to the 
inherent random error vs. any systematic differences between the observers 
or their methods, there is now the equally interesting question as to whether 
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the heat treatment affects the elastic limit. To be able to answer the latter 
question one must estimate the effect of changing observers, the effect of 
the heat treatment, and the random error that is hypothesized to be com- 
mon to all observers and heat treatments. 

Again we make what is called the “null” hypothesis, that there are no 
real differences between observers or between heat treatments, and then 
assess the probability of the observed result. That is, we determine certain 
values of q 2 /f and their ratios. Then the probability that these ratios 
have the value that they are observed, to have is determined when it is 
supposed that all observations were drawn from a single population with 
unique values for the mean n and the standard deviation <r. 

It is clear from the above discussion that we envisage an array of 
numbers Xij, which can be arranged as follows 





Xu 

x 12 

• • • x lm 

*21 

*22 

X 2m 

• 

Xnl 

X n2 

• • • x nm 


(12-57) 


where i = 1 , ,n and j = 1 , ,m. The various values of j may 

correspond to the analysts, and the various values of i to the heat treat- 
ments. In effect, all the readings in a row, each taken by a different analyst, 
can be used to estimate the effect of the heat treatment corresponding to 
that row. All the readings in a particular column can be used to estimate 
the systematic error associated with the observer who made the observa- 
tions in that column. It must be realized that, while it is easier to describe 
these ideas as has just been done, all that one can actually test is whether 
or not one heat treatment has a different effect from that of another, or 
whether there are differences in the systematic errors associated with 
the different analysts. 

Under the null hypothesis it is assumed that neither analyst nor heat 
treatment introduces any systematic (nonrandom) effect, and that there 
is a single answer X 0 for the elastic limit. Were the null hypothesis violated, 
however, X ij could be written as 


Xij — Xq -f- Ej + Ej + Vij, (12—58) 

where Ei is the fixed systematic “error” contributed by the ith heat treat- 
ment, Ej similarly the contribution by the jth analyst, and is that 
particular random error, drawn from the population which is associated 
with the kind of experiment being performed and which happened to turn 
up at the ijth reading. 
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The mean of the jth column then would be 

X, = + x„) + Ei + \ (x; »„) ; (12-59) 

and the sum of the squares of the residuals between each of the Xj and 
their mean, which, of course, is the mean of all the readings, would be 

x:^-Kp ') 2 

- e « - £ (e +^[e(e .«)* - i (e e >,) 2 ] 

+ i [e *1 (e •«) - s (e £ >)(e E »«)] • (12-60) 

We see that no sum of vfj appears in (12-60). Those sums of the va 
which do appear will be proportionately less as j increases, if the Ej are 
sufficiently large and nonrandom. Thus the numerical value of the right- 
hand side of Eq. (12-60) serves as an estimate of the sum of the squares 
of the residuals between the various Ej and their mean. Let us emphasize 
here that, in practice, one does not know any of the Ej but must estimate 
their mean and then calculate residuals as usual. A degree of freedom is 
lost in the process; the counting of degrees of freedom will be treated in 
the following discussion. 

A similar expression can be found for the sum of the squares of the 
residuals between the row means and their mean. 

If the sum of squares of the residuals between each Xij and the mean 
of all the Xjj (the grand mean) is found, it will be 

EEn-^(xE^) 2 

X J N l J ' 

- « [e *? - S (e *)*\ + * [e i (e tf ] 

+ [E E ^ (E E »«)*] 

x J x x o 

x 0 t x 3 

+ 2 IE E, E on - i (E B >)(E E »«)] • (12-61) 

L j i x j ' x i j ' J 
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Some of these terms are identical with those seen previously. If the sum 
of the squares calculated for the column means is multiplied by n, and 
the sum of squares of the residuals between the row means is multiplied 
by m, and then if each of these is subtracted from the total as given in 
Eq. (12-61), we will find 

[E £ A ■ - ± (£ £ ...)•]-;[?(?•«)'- i (? ? • •■■)’] 

Here none of the Ei and Ej appear. Thus this is the estimate of the 
random error. Again, every one of these terms except the first will become 
insignificantly small as the number of observations increases. 

The reason for multiplying by n and m before separation of the sums 
from the total sum of squares should be brought out. It is clear that when 
the total sum of squares of the residuals was calculated, each observation 
was given the same weight, which may be called unity. The weight of a 
column mean, however, is n since n readings were used in its determination. 
The sums of squares of readings of weight n is l/n times those of the 
same number of readings of unit weight, so that in order to be able to make 
the proper comparison, by subtraction in this case, between sums of 
squares of the same weight, we must convert all of them to unit weight. 
The same argument applies to the row means, except that n is replaced 
by m. 

Our discussion so far has been confined to the sums of the squares of 
the residuals. Since we are using the E-distribution, we must calculate 
ratios of the form 



where /i is the number of degrees of freedom associated with the sum of 
squares q\, and similarly for/ 2 and q\. In most applications — all of those 
here, in fact — subscript 2 refers to the error estimate and subscript 1 refers 
alternately to the columns and rows of an array like (12-57). In such an 
application, the numbers of degrees of freedom can be easily obtained. 
To get the total sum of squares, we must calculate the grand mean. Thus 
the total sum of squares has a number of degrees of freedom 


./total — 17171 1. 

When the sum of squares between column means is to be calculated, 
we must first calculate, in principle at least, a mean of the column means. 
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Table 18-8 


j 

i 

1 

2 

3 

4 

Xi 

X; — X 

1 

5 

4 

3 

2 

3.5 

—l 

2 

6 

5 

4 

3 

4.5 

0 

3 

7 

6 

5 

4 

5.5 

+i 

Xj 

x — X 

6 

+1.5 

5 

+0.5 

4 

-0.5 

3 

| -1.5 

X = 

= 4.5 


— T2(2>*;) 2 = 23 

40? - £(2» 2 ] - 8, 3E*? - i(2>;) 2 ] = 15 


Table 18-9 


j 

i 

1 

2 

3 

4 

1 

-0.1 

+0.2 

-0.4 

+0.1 

2 


0 

-0.4 

0 

3 

iifggifBIfl 

+0.3 

-0.3 

+0.2 


Table 18-10 


* 

i 

1 

2 

3 

4 

Xi 

Xi — X 

1 

4.9 

4.2 

2.6 

2.1 

3.450 


2 

6.5 

5.0 

3.6 

3.0 

4.525 


3 

6.9 

6.3 

4.7 

4.2 

5.525 


Xj 

Xj — X 

■ 

5.167 

0.667 

3.633 

-0.867 

3.100 
— 1.400 

X = 

= 4.5 


- T \(Zxij) 2 = 26.060 

4[2> 2 - ±(£z t ) 2 ] = 8.615, 3[£x 2 - i(E%) 2 l = 17.150 


26.060 — 8.615 — 17.147 = 0.295 


Table 18-11 



/ 

d 2 

9 7/ 

F 

F 99 

Total 

11 

26.06 

2.369 



Column 

3 

17.150 

5.717 

116.20 

9.78 

Row 

2 

8.615 

4.308 

87.56 

10.9 

Error 

6 

0.295 

0.0492 
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Thus this number of degrees of freedom is 


and similarly, 


/col ^ 1 ) 

/row — W 1 • 


The number of degrees of freedom left by which the error variance can be 
estimated is 


/e — /total /col /row- 


The above description is convenient for use. But it might be more satis- 
fying to look at / e , for instance, in another way. Let us count the actual 
number of independent quantities that have been calculated. It is clear 
that one of these is the grand mean. It is also evident that the mean of 
the column means is the same as the grand mean. Thus, when m — 1 
of them have been determined, the mth can be calculated from the first 
m — 1 and the grand mean, so that only m — 1 of them are independent. 
The same argument applies to the row means; only n — 1 of them are 
independent. Thus the number of independent quantities that have been 
calculated is 1 + (ra — l) + (n — 1), and, as usual, the degrees of 
freedom for the error calculation is 


/ e = mn — [1 + (ra — 1) + (n — 1)]. 

As an introductory example, consider an array such as (12-57) where 
there are very definite differences assignable to rows and columns, but 
there is no error. Suppose n = 3, w = 4, Ei — 1, 2, 3, Ej = 4, 3, 2, 1 
with the X 0 of Eq. (12-58) also equal to zero. Then the array becomes 
Table 12-8. The total sum of squares, 23, was partitioned into 8 for the 
differences between the means of the rows, and 15 for the differences 
between the means of the columns. We see that an X of 4.5 was found, 
emphasizing the fact that there is no way of determining the X 0 used in 
the construction of the table. Correspondingly, individual values of the 
Ei and Ej cannot be found. This example, where there is no error, shows 
that the unit differences between the E’s shows up in the differences between 
individual row and column means. 

Now suppose that the error assignments shown in Table 12-9 are made to 
the various ijth locations. The observation table is now as in Table 12-10. 

As a result of the partition shown in Table 12-10, we see that there is 
now something left over for an estimate of the error, so that it is possible 
to proceed with the elementary analysis as outlined in the table. The 
sum of the squares associated with row means, 8.615, has 2 degrees of 
freedom, so that the variance here is 4.308. The error sum of squares has 
6 degrees of freedom, so that the error variance is 0.0492. The E-ratio 
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Table 12-12 



f 

q 2 

4 2 // 

F 

F 95 

F 99 

Total 

11 

131.00 

11.91 




Column 

3 

85.68 

28.56 

5.74 

4.76 

9.78 

Row 

2 

15.50 

7.75 

1.56 

5.14 

10.9 

Error 

6 

29.82 

4.97 





Table 12-13 


L i 

l 2 

U 

u 

l 5 

L & 

L 7 

1271 

1268 

1269 

1267 

1269 

1267 

1267 

1423 

1419 

1420 


1419 

1416 

1420 

1553 

1549 

1548 


1549 


1550 

1764 

1761 

1760 


1762 

1764 

1763 

1953 

1951 

1951 


1952 

1953 

1953 

2050 

2043 

2044 


2043 


2048 

2218 

2215 

2217 

2216 

2218 


2219 


Table 12—14 

DIFFERENCES BETWEEN NBS (1958) AND OTHER LABORATORIES (LAMP 
NO. 242) 


Current , 

amp 

L i 

l 2 

L3 

U 

l 5 

U 

L 7 

8.62 

-4 

-1 

—2 

0 

-2 

0 

0 

9.77 

-3 

+1 


0 

+1 

+4 

0 


—3 

+1 

+2 

0 

+1 

0 

0 

12.99 

—4 

—1 


0 

-2 

-4 

-3 


—3 

— 1 

—1 

0 

-2 

-3 

-3 

16.16 

—7 


—1 

0 

0 

—5 

-5 

18.28 

—2 

+1 

-1 

0 

—2 

—4 

-3 


Table 12-15 



/ 

q 2 

9 2 // 

F 

F 95 

F 99 

Total 

48 

205.06 

4.27 




Lab (column) 

6 

75.63 

12.60 

5.92 

2.36 

3.35 

Current (row) 

6 

52.78 

8.80 

4.13 

2.36 

3.35 

Error 

36 

76.65 

2.129 
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for the effect of possible systematic differences between rows is then 
87.56. We now have to compare this ratio with the correct number in 
Table D. The ratio given in the table for numerator degrees of freedom = 
2 and denominator degrees of freedom = 6 is 5.14 at the 95% level. 
This means that any value of E for these degrees of freedom less than 
5.14 lies in the range that includes 95% of all the values of F, while one 
as large as 87.56 lies in a range that includes only 5% of the values. Even 
at the 99% level this limiting value is only 10.9, so that one can conclude 
that it is very improbable that a value as large as 87.56 would occur by 
chance. One reaches a similar conclusion, of course, concerning the 
column means. The results are summarized in Table 12-11. 

As a further illustration, we can multiply the va of Table 12-9 by 10 
before insertion. This makes the errors comparable to between-column 
and between-row differences. We leave it as an exercise for the reader to 
reproduce Table 12-12. In this table, there is still an indication of a dif- 
ference in the between-column effect. The observed value of F has a 
probability of occurrence by chance of around 4% when there is no real 
difference. The value of F for the between-row differences has a very 
high probability of occurring purely by chance when there is no real 
difference between the rows. 

We shall conclude this section with a real example.* In Table 12-13 
are listed the temperatures of the filament of a special type of lamp as 
observed by several national standardizing laboratories. Each row cor- 
responds to a particular current in the lamp. It is clear that there is no 
point in making between-row calculations; obviously there will be dif- 
ferences much larger than any errors that such laboratories might make. 
It is convenient then to use one of them, L 4 say, as a base and origin, 
and subtract its observations from each of the others in the same row. 
The result is Table 12-14, which is much more manageable. The analysis 
of the variance is shown in Table 12-15. The fact that the Lab ratio 
exceeds the critical E-ratios strongly suggests that there is a systematic 
difference between these laboratories which is larger than the error made 
in the observation of a temperature at any one of the laboratories. It is 
seen also that the between-current to error E-ratio exceeds the critical 
E-ratios. Moreover, Table 12-14 shows an odd distribution of values in 
the rows, particularly noticeable in columns L 6 and L 7 . This kind of 
distribution demands analysis of a degree of sophistication beyond this 
book, though the simple ratio did point out that there is something there 
worth examining. Thus even the simplest variance analysis was very 
useful in a complex case. 


* R. J. Thorn and G. H. Winslow, Rev. Sci. Instr., 33, 961 (1962). 
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12-8 THE TWO-DIMENSIONAL NORMAL DISTRIBUTION; 

THE CORRELATION COEFFICIENT 

It was mentioned in the introduction to this chapter that some distributions 
are two dimensional. In this section we shall attempt to face this problem, 
but our discussion will be severely limited compared to what could be 
said on this subject or its natural extension to multidimensional distribu- 
tions. Since any presentation of this topic easily gets far beyond the scope 
of this text, we shall confine ourselves to the parent distribution for two 
normally distributed variables which are said to be in linear correlation. 
Moreover, we shall restrict our discussion even for this case. We will not 
use the parent two-dimensional distribution to derive any distribution 
functions for finite-sized samples in the way in which we used a one- 
dimensional parent normal distribution to derive the x-square, Student’s 
t, and the F-ratio distributions. Rather, we will present this distribution 
only for the purpose of giving the reader an introduction to the implications 
of multidimensional distributions. We also wish to emphasize the distinc- 
tion between that parameter of the parent two-dimensional distribution 
which is called the correlation coefficient and a quantity calculated for a 
finite sample which is called by the same name, and hence to show why 
the latter is defined as it is. The derivation of the distribution of the sample 
correlation coefficient can be carried out without reference to the parent 
two-dimensional coefficient, and this is what we will do. 

As stated above, we are concerned with the observations of two variables, 
taken in pairs. Let us suppose that the relation between them is of the 
linear form 

y = A + Bx (12-62) 

and review the meaning which this form has had for us up to this point. 

We have viewed the variable £ as a nonrandom variable, the value of 
which could be established at will during the course of an experiment. 
We observed values of y at various values of x in order to gather sufficient 
information for estimating to some desired and calculable degree of pre- 
cision values of the parameters A and B. We found that repeated observa- 
tions of y at the same x could not be expected to yield the same results. 
It is evident then that we assumed that at each value of x, there was some 
mean value fx of y which we wanted to estimate by performing the experi- 
ment. In the context of previous discussions, the “true” relation was 
assumed to be 

fx = Aq -j- BqX. (12—63) 

In the following discussion we will also see that we have assumed the 
standard deviation in y to be the same at each value of x. Thus we have 
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been studying a series of one-dimensional distributions of y where each 
such distribution had the same standard deviation and a mean determined 
by the assigned value of x. 

The distinction between the above situation and the one in which both 
x and y must be considered as random variables can be most easily under- 
stood through an illustration. Let us suppose now that we are not per- 
forming a controlled experiment but are merely observing pairs, (x, y), 
where the occurrence of particular values of either is entirely beyond our 
control. Such a situation might be illustrated by the relation between the 
height and weight of 25-year-old men used as an example in the introduc- 
tion to this chapter, or by the relation between the temperature and the 
total runs per game in big-league baseball, where it might be supposed 
that there is a tendency toward fewer runs at higher temperatures, or by 
the relation between student grades in mathematics and in physics or 
chemistry, and so on. In the first and third of these examples it is clear 
that there is no basis on which to say that a change in one of the variables 
causes a change in the other in the same way that we can say that a change 
in temperature causes a change in vapor pressure. Neither can this be 
said in the case of the second example since air temperature and humidity 
tend to vary together, and the latter might be expected to have as great 
an effect on the number of runs per game as the former. 

Thus in cases like these, each variable must be considered as random. 
In deriving the following particular distribution, we assume that each of 
the variables, considered separately, is normally distributed. That is, we 
can assume some single mean value u y of y, and a standard deviation 
(T y in y; and similarly for x. Further, we can also apply remarks similar 
to our earlier ones about the existence of a normal distribution of values 
of y for a given value of x by extending them to the existence of a normal 
distribution of values of x for a given value of y. 

For the sake of convenience in later descriptions, let us introduce a new 
term. When we plot y as a function of x according to Eq. (12-62), we speak 
of the line through the points as the line of regression of y on x, and con- 
versely, we speak of the line of regression of x on y. An examination of 
Eq. (10-7), which is one of the equations used to determine A and B for 
a finite number of observations, shows that we can write this equation 
as 

y = A + Bx. 

For the often suggested infinite number of observations, this would 
become 

V-v — Aq + Boy x , (12-64) 

where A 0 and B 0 are the true values for the line of regression of y on x. 
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If we examined the line of regression of x and y, we would similarly 
find 

Hx = o,q -f- bofXy, (12—65) 

where y x and y y are the same as in Eqs. (12-64) and (12-65), but it is not 
necessarily true that 

Oo = - 4 s - bo = ~ ■ (12-66) 

£>o &0 

Consider the probability of observing a particular ( x , y) pair in the 
context of the line of regression of y on x. For a particular value of x, the 
mean value of y is 

n = fly + B 0 (x — y x ), (12-67) 

where we have used Eq. (12-64) to eliminate A 0 . We let the standard 
deviation for the distribution of the values of y at this value of x be 
represented by a, and leave its relationship to other parameters of the 
distribution to be determined later. Then the probability of obtaining a 
particular value of y at this value of x is 


1 

y/2ira 


exp 


(y - m ) 2 ' 

2a 2 


dy, 


where y is given by Eq. (12-67). 
of x is 

1 

7= exp 
V 2ira x 


The probability of observing this value 


(x — y x ) 2 l 

2a\ J 


dx, 


so that the probability of observing this particular pair is 


c&> 


1 


2iraa x 


exp j — 


(x — y x ) 2 [y — y y — B 0 (x — yxjf 


2 ai 


2a" 


dxdy. 


If we now consider the probability of obtaining this same (x, y) pair 
via a discussion of the line of regression of x on y, we will obtain 


d& = 


1 


27ra'a, 


exp — 


(y — Vy) 2 [x — Hx — b 0 (y — y y )]‘" 


2a" 


2a' 


dx dy, 


where a' is the standard deviation for the distribution of the values of x 
at this value of y. Since both these expressions refer to the same (x, y) 
pair, they must be equal. If they are to be equal for any (x, y) pair, not 
only must the arguments of the exponential functions be equal but, in 
those arguments, the coefficients of like powers of (x — y x ) and (y — y y ) 
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must be equal. Therefore we must have, as the reader can show, 



B o 
a 2 


1 






( 12 - 68 ) 

(12-69) 

(12-70) 


If Eq. (12-62) merely represented an algebraic relationship, there would 
be no scattering or grouping of the points along a straight line, but instead 
every pair would be exactly on the same line. Then Eqs. (12-66) would 
be satisfied and the product ( B 0 b 0 ) would be equal to unity. This would 
still be true if the line y vs. x approached the vertical or the horizontal. 
On the other hand, if the scatter of the infinite number of pairs were such 
that the points were uniformly distributed in the plane, both B 0 and b 0 
would be zero and the product would be zero. Thus we are led to define a 
fifth parameter for the two-dimensional distribution in addition to p x , 
(T x , n v , and <r y . This parameter, called the correlation coefficient, is defined 
by 

P 2 = fio&o, (12-71) 


where the value of p 2 ranges from 0 to 1. When p is 0, we say that there is 
no correlation; when it is 1, we say that the correlation is perfect. 

If Eq. (12-69) is multiplied through by B 0 and Eq. (12-71) is sub- 
stituted into it, we will obtain 



so that from Eq. (12-67) we find that 


g' = (T x Vl — p 2 . 


For the positive square root of B% 


B o 



(12-72) 


Since g and <r x are necessarily positive and since —1 < p < 1 when 
0 < p 2 < 1, we can speak of negative correlation if one variable tends to 
increase as the other decreases, or of positive correlation if they either 
increase or decrease together. 

Similarly, 


g = Gy-sJ 1 — p 2 . 


(12-73) 
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From Eqs. (12-72) and (12-73) we obtain 

(T Gy (JG X) 

so that the coefficients of the exponential functions in the two alternative 
expressions for the probability of observing a particular pair are also equal. 
Therefore, by using either of these expressions with appropriate substitu- 
tions, we find the desired two-dimensional normal distribution for variables 
in linear correlation: 


d$ = 


2 7r<j x ayV 1 — p 2 

x-'l-srb;^ 




2 pQc — — My) | (y — PvY 

"I 2 

CxCfy (Ty 


| dx dy. 
(12-74) 


We must, however, verify the normalization by integrating d4> over all 
values of x and of y. The troublesome cross product (x — y x )(y — n y ) 
can be eliminated if we make the substitutions: 


x — Hx 


1 1 

Cx = “V- 

<7y \ 


1 1 — P 




+ P 


P 


The reader should show that the Jacobian for this change of variables is 


VxPyV 1 — p 2 so that 

/.,,/* = 2^ /_. exp [' 


all y 


(rrf)] du L exp [- j (r^)] *• 


yT( 1 — 
2 


which is unity. 

It is easy to see that if p = 0, Eq. (12-74) becomes the product of a 
normal distribution for x and a normal distribution for y. That is, x and 
y are completely independent. There is no correlation between them, no 
mutual influence of one on the other, and the (x, y) pairs observed together 
would tend to fill the (x, y ) -plane uniformly with points. 

The easiest way to see the effect of perfect correlation is by looking at 
Eqs. (12-72) and (12-73). If p = ±1, then a = cr' — 0. That is, for a 
particular value of x there is no scatter in possible values of y, and vice 
versa. All (x, y) pairs lie exactly on the same line. 

The experienced reader will have observed that it would be most un- 
usual for all (x, y) pairs in a real, finite sample to lie exactly on the same 
straight line. Were it to happen, the most likely explanation would be 
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as discussed in Section 10-9; the scales used were too coarse. Thus any 
real observational situation where a relationship of the form of Eq. (12-62) 
is known or expected to exist between two variables, it is highly unlikely 
that a finite-sized sample will show perfect correlation. We are then 
faced with the problem of calculating a correlation coefficient for such a 
sample and discussing its meaning once it is calculated. 

The method of calculating the correlation coefficient for a real sample 
follows naturally from Eq. (12-71). For the case in question the latter is 
rewritten as 

r 2 = Bb, (12-75) 


where r is the correlation coefficient for the sample, B is the least-squares 
value of the slope of the line of regression of y on x, and b is the least- 
squares value of the slope of the line of regression of x on y; all these are 
for the same set of data, of course. Reference to Eqs. (10-10), (12-18), 
and (12-20) shows that if the numerator and denominator in the first of 
these are divided by n, the equation can be written as 


where 

Similarly, 

Thus 


D HxiVi ~ nxy 
B = , 

Qx 


ql = LGr i — x)‘ 


X> iiji — nxy 


Y,XiVi ~ nxy 


(12-76) 


(12-77) 


or, directly in terms of the observations, 


nLxtyj. — LgiLj/i 

'[n£x? - (I>.) 2 ] [»&? - (£«/.) 2 ] 


(12-78) 


A derivation of the distribution of r based on Eq. (12-74) is even more 
tedious than those we have faced in the previous sections of this chapter. 
Such a derivation has been given by Uspensky [14], for instance, and will 
not be reproduced here. Unfortunately, even this derivation leads to a 
result which specifically depends on the value of the unknown “true” 
correlation coefficient p for the distribution. 

On the other hand, the more common test, and the one for which tables 
can be found most easily, is based on a study of the probability of observing 
a value of r less than or equal to some arbitrary value R when p is zero, 
i.e., when there is no correlation in the parent distribution. We can 
arrive at this parent distribution by discussing the line of regression of y 



194 


INTRODUCTION TO STATISTICAL ANALYSIS 


[12-8 


on x, treating x as a nonrandom variable. Then the mean of y at each 
value of x is given by Eq. (12-63), and we assume the standard deviation 
a in y to be the same at each value of x. The probability of obtaining the 
n observations is 


= 



Jl(yi — Ap — B&j) 
2<r 2 



dyi. 


(12-79) 


The least-squares value of the slope is given by Eq. (12-76); instead of 
using the least-squares intercept, we prefer to eliminate it through 


A = y — Bx. 


Corresponding to the sums of squares of residuals designated previously by 
q 2 , we define 

Q 2 = HiVi — A — Bxi) 2 = Y,[(Vi ~ y) ~ B(x{ — Z)] 2 . (12-80) 


In the expansion of the right-hand side of Eq. (12-80) we find the term 
^2(xi — x) (tji — y). The reader can show that this is simply Bql so that 

Q 2 = ql ~ B 2 q 2 x . (12-81) 

With this definition of Q 2 , we see that the sum of squares of the real errors 
which appears in the exponential function of Eq. (12-79) is 

Terror 2 = Q 2 + ql(B - B 0 ) 2 + n(y - A 0 - B^f. (12-82) 
We can proceed now as previously. That is, let 


W n Q > Un — 1 y > 

(yi — y) — B(xj — x) 

Q 


Un — 2 — 

i = 1, . . . , n 


3. 


We must now write the old variables in terms of the new ones. The first 
n — 3 are easy: yi = y + B(xi — x) + Qu{, i = 1, . . . , n — 3. By pro- 
cedures similar to those used in Section 12-5 we find 


(y n — y) + On- 1 — V) + (yn - 2 — V) 

= —QY.U + B[(x n — x) + (x n _ x — T) + ( X n _2 — 3)L 

Xn(y n — y) + X n -i(y n -l — V) + — 2 (.Vn — 2 ~ V) 

= —QT,™ + B[x n (x n — x) + x n _x(x n _\ — x) + X n _ 2 (x n _2 — X)], 

[(y n — V) — B(*n — x )] 2 + [(y n -l — y) ~ B(x n — i — x )] 2 

+ [(y n - 2 — V) — B(x n _ 1 — x )] 2 = Q 2 ( 1 — Lw 2 ), 
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where the sums on the right-hand sides of these equations are over the 
indices i = 1 , . . n — 3. 

For convenience in writing, let 


(.Vn — j f7) B(x n —j x ) — Y n —j , j 0, 1, 2. 


Then the previous three equations become 

Yn + F n _! + F n _ 2 = -QZu, 

%nY n “1“ X n — 1 Y n — j -(- X n — 2 F n — 2 = Q^,XU, 

Yn + Fn-1 + Yn—2 = Q\ 1 ~ Et* 3 ). 

Equations (12-83) and (12-84) can be solved for F n _i and F n _ 2 
v _ Y n (x n ~ X n - 2) + Q(I>W — Xn- 2 T,U) 

X n — 1 — ? 

X n _ 2 ~ X n _x 


v _ Y n (x n — X n - 1 ) + Q(EXU — X n _ilM) 

X n—2 — 

X n — 2 1 


(12-83) 

(12-84) 

(12-85) 

to yield 
( 12 - 86 ) 

(12-87) 


There is no point in doing more algebra than is necessary. Examination of 
these equations in the light of our previous experience shows that if 
Eqs. (12-86) and (12-87) are substituted into Eq. (12-85), the result will 
be of the form 

MiYn + 2 M 2 Y n Q + M 3 Q 2 = 0, (12-88) 

where the coefficients M i} M 2 , M 3 do not contain any of the three quantities 
y, B, or Q. The solution of Eq. (12-88) will be of the form 


Yn = ^r (~ M 2 ± VmI - 

But we see from Eqs. (12-86) and (12-87) that if F n is proportional to Q, 
then so are Y n ~i and F n _ 2 , and y, B, and Q are not in the constants of 
proportionality. In fact, we can now write 

Vn — y + B(x n — x) + N n Q, 
y n _i = y + B(x n _ 1 — x) + N n -iQ, 
yn — 2 —* V "b B(x n — 2 x) -(- N n — 2 Qt 


where the coefficients of Q contain only the values of x and the variables 
Ui for i < n — 2. The Jacobian for the variable change yi — » is then 
of the same form as found in the derivation of the X 2 -distribution ; the only 
difference is in the power of Q which can be divided out of it. Hence we 
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see that the Jacobian can be written as CQ n 3 so that Eq. (12-79) becomes 

d$ = ex P {“ 2^2 & + B ~ B o ) 2 + n (V ~ A 0 — 5 0 x) 2 ]J 

71 3 

X Q n -*dydBdQC JJ du i- 

i= 1 

Here, as with the derivation of the X ^distribution, we will normalize the 
final distribution of interest rather than derive the normalization factor 
from an expression for C. We note in passing, however, that y is normally 
distributed, with mean A 0 + B 0 x and standard deviation cr/y/n. Further- 
more, B is seen to be normally distributed, with mean B 0 and standard 
deviation <r/q x . Now, it is indeed B we are interested in, but we wish to 
find a measure of its distribution which is independent of the unknown cr. 
We also recall that two quantities A and B were determined from the 
data so that the number of degrees of freedom is 

f = n — 2. 

Thus out of all the terms in d<J>, we save only those variable factors which 
are of interest: 

d*' = Qf- 1 exp {- ^[l + q ‘ (B Q 2 Bo)2 } dBdQ. 

We get rid of the unknown a by making the change of variables 

n - O f - ~ B o) 

q — Q, t — q » 

where we are led to include the factor \/f in the definition of this t because 
the form of d& shows something close to Student’s t distribution coming 
up; we can cast d& immediately into the form of this known distribution 
by defining t in this way. 

In terms of the new variables the old ones are 



and the Jacobian is q/(q x y/f)> If we again preserve only the variable factors 
of interest in we will have 




We next integrate over all values of q to obtain that distribution of t 
which is independent of any particular value of Q. We need go no further. 
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Reference to the derivation of Eq. (12-51) and to the definitions of Q 
and q x shows that the quantity 


t= (B - B 0 ) 


(n — 2 )[n£s? — 
nLiVi — A — Bxi ) 2 


1/2 


(12-89) 


has Student’s t distribution. But the right-hand side of Eq. (12—89) is a 
familiar combination. Equations (11-26), (11-28), (11-29), and (11-32) 
show that Eq. (12-89) can be written as 

t = — — — , (12-90) 

where <Tb is the previously defined estimate of the standard deviation of 
the slope, calculated on the basis of what was called external consistency, 
for uniformly weighted observations. 

Equations (12-89) and (12-90) contain the unknown B 0 . It was men- 
tioned that the most common correlation test is based on the hypothesis 
that B 0 is zero. That is, we ask how likely it is that the observed B would 
occur when actually B 0 is zero. If it turns out to be very unlikely, the 
hypothesis is rejected and we conclude that there is correlation. The 
decision we reach depends on the probability level at which the test is 
made; i.e., the degree of risk of making an erroneous decision that we feel 
we can stand. 

Before showing the relationship between this ^-distribution and the 
distribution of the correlation coefficient, we will consider a real, physical 
example. The example is fairly involved, but for that reason, it is illus- 
trative of the use of statistical analyses beyond the level discussed in 
Chapters 9, 10, and the early sections of Chapter 11, and is used to en- 
courage the reader to work beyond that level. 

The example involves the study of a model for the solution of additional 
oxygen in solid uranium dioxide. We will not burden the reader with the 
theory involved, but will ask him to take the equations on faith. One can 
derive from the theory an equation 


Y = a + bx, 


(12-91) 


where x is the ratio of the number of added atoms of oxygen to the number 
of uranium atoms present, and a and b are supposed to be constant if the 
model fits the facts. The function Y turns out to be of the form 


Y = —F% t + 2 RT 




+ C 


+ f(T), (12-92) 


where f(T) is a known function of the temperature, T, C and a are constant 
if the model fits the facts, R is the gas constant, —Fq 2 is a quantity 
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measured by the experimenter at various values of T for a given value of 
x, and the measurement is done at various values of x. We wish to see 
whether the model fits the facts with a = 1. There are no restrictions 
on values of C, a, or b, though the investigator has ideas as to what he 
thinks are reasonable ranges in which these numbers should fall. 

What is to be examined here is the fact that Y appears to depend on 
the temperature according to Eq. (12-92), but Eq. (12-91) says that it 
should be independent of the temperature. If these two facts cannot be 
brought into agreement, the model is a failure. 

The problem is examined by calculating B = dY/dT and <jb by the 
standard least-square procedures, and t from Eq. (12-90) with the assump- 
tion that B 0 = 0. The results for a few representative observations are 
given in Table 12-16 along with values of t that would be expected to be 
exceeded 5% and 1% of the time if B 0 actually were zero. The hypothesis 
is verified at neither level for x at 0.026, 0.158, or 0.172, at the 1% level 
but not at the 5% level for x at 0.112, 0.085, and 0.114; it is verified at 
both levels for all other values of x. This is the conclusion reached by a 
more detailed examination of a greater mass of data; that is, the model 
is satisfactory for x < ~0.08 and fails for x > ~0.08. This more detailed 
examination also verifies another conclusion implied above: something 
went wrong experimentally at x = 0.026. 

We have shown above how to make the usually desired test on problems 
of correlation by using the ^-distribution. Nevertheless, the reader should 


Table 12-16 


X 

f 

dY 

dT 

t 

< 0.05 

<0.01 

Ref. 


1 

msm 

1.54 

12.71 

63.66 

a 

0.078 

1 

■m ■ 

8.64 

12.71 

63.66 

a 

0.112 

1 

-2.731 

34.65 

12.71 

63.66 

a 

0.030 

6 

0.507 

0.41 

2.45 

3.71 

b 

0.033 

3 

0.503 

2.58 

3.18 

5.84 

b 

0.026 

4 

8.636 

4.73 

2.78 

4.60 

c 

0.037 

2 

0.664 

1.12 

4.30 

9.92 

c 

0.085 

4 

-1.624 

3.84 

2.78 

4.60 

c 

0.114 

1 

0.577 

24.36 

12.71 

63.66 

d 

0.158 

1 

—7.624 

107.99 

12.71 

63.66 

d 

0.172 

1 

-8.360 

348.87 

12.71 

63.66 

d 


a. Aronson and Belle, J. Chem. Phys., 29, 151 (1958). 

b. Markin and Bones, Atomic Energy Research Establishment Report AERE- 
R4178 (1962). 

c. Kiukkola, Acta Chem. Scand., 16, 327 (1962). 

d. Roberts and Walter, J. Inorg. Nuclear Chem., 22, 213 (1961). 
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be familiar with the relationship between this distribution and the dis- 
tribution of the sample correlation coefficient given by Eq. (12-77). 

If Eq. (12-89) is rewritten in terms of Q and q x for B 0 = 0, it becomes 


t = B 


Vhx 

Q 


From Eqs. (12-76) and (12-77) we find that 


Bq x = rq y , 


so that from Eq. (12-81) we obtain 

= Bq x Vl — r 2 
** r 

Hence 

t = ■ / ry ^ - • (12-93) 

Vl — r 2 

Thus, for example, the critical value of r for / = 4 that would be ex- 
pected to be exceeded 1% of the time for p = 0, calculated from the 
corresponding < 0 .oi = 4.60, is 0.917. 

In order to see what the distribution of r looks like, we make in the t- 
distribution as given by Eq. (12-51) the variable change corresponding 
to Eq. (12-93). The result is 



but we will do nothing further with it, since we have accomplished our 
purpose via the t- test. 


PROBLEMS 

1. For observations distributed normally: 

(a) What is the expected number out of 10 such observations that will 
have values of X/<r greater than or equal to unity, and what is the standard 
deviation in this number? 

(b) What is the expected number that will have \X/<r\ greater than or 

equal to 0.6745, and what is the standard deviation in this number? 
Answer: (a) = 1.59 ± 1.16 (b) = 5.0 ± 1.6 



200 


INTRODUCTION TO STATISTICAL ANALYSIS 


2. For 10 observations taken from a normal distribution, write the expression 
from which one would find the distribution function for those having 
errors such that — 1 < X/cr < 1. 

Answer: C(10, R) (0.6827 ) * (0.3173) 10 -* 

3. For a pair of fair dice: 

(a) What are the probabilities for getting each of the possible sums 
from 2 to 12 in a single toss? 

(b) For 12 tosses, write the expression from which the distribution 
function for sevens would be calculated. 

(c) For 12 tosses, what are the expectation values and standard devia- 
tions for sevens, for either four or ten, for either nine or eleven? 

Answer: (a) 2 or 12, 3 or 11, ygj 4 or 10, -j^; 5 or 9, 7, ^ 

(b) <7(12, fl) (£)*(£) 12 “* (c) 7, 2 ± 1.3; 4 or 10, 2 ± 1.3; 9 or 11, 
2 ± 1.3 

4. You are presented with the following numbers of counts/min for a single 
radioactive sample: 33.0, 32.2. 32.3, 31.6, 31.0, 32.6, 32.8, 31.1. Assuming 
that each figure was deduced from approximately the same total count, 
estimate that count and the time interval that was used. 

Answer: C = 1780 counts, t = 55.5 min 

5. Ten fair dice are rolled. What is the probability that two of them show 
an ace, five show either 2 or 3, and three show either 4, 5, or 6? 

Answer: 35/(36 X 27) 

6. A uniform distribution extends from — 5 to +5. Of five readings, what is 
the probability that none lie between — 5 and — 4 or between 4 and 5, 
that one lies between —4 and —3 and one lies between 3 and 4, and that 
three lie between — 3 and +3? 

Answer: 0.0432 

7. For readings having a uniform probability of occurrence between — a and 
+a, and zero probability of occurrence outside these limits: 

(a) Show that the distribution function of the range is 

, n(n — 1) n—2/n x 

(2a -r). 

[Hint: Note that the possible values of x, using the notation of Section 
12-4, for a fixed value of r, range from — a to a — r.] 

(b) What is the most probable value of the range? 

(c) What is the expectation value of the range? 

Answer: (b) 2a (c) [(n — l)/(n+ l)](2a) 

8. For the distribution of Problem 7, and n = 2, what values of the 
range are expected to be exceeded 99% of the time and 1% of the time? 
Answer: R ~ 0.01a exceeded 99% of the time, R = 1.8a exceeded 1% 
of the time 

9. Show that the expectation value of X 2 is / and that the expectation value 
for the mean-square error in X 2 , i.e., the expectation value of (w 2 — /) 2 , 
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is 2/. Hence, remembering the definition of X 2 , and using the approximation 
methods of Chapter 3 for large values of /, show that we estimate <r for the 
parent distribution to be Vy ’a 2 // and the standard deviation in a to be 
o-/V2/. Compare the first of these results with Eqs. (10-24) and (10-26). 
[Hint: Write X 2 = / dh y/~f.] 

10. In the dart-dropping experiment, 498 observations were used to determine 
the two quantities /x and cr. What is the standard deviation in <r? Would it 
appear that the value of a estimated with any ordinary number of observa- 
tions is very precise? 

Answer: 0.011; no 

11. If a single coin is tossed N times, 

2 (Nh - A 72 ) 2 + (Nt ~ iV/2) 2 

N(m) 

with / = 1, since the number of heads, Nh, plus the number of tails, Nt, 
must equal N. Consider the data of Table 6-1 for N = 10, 100, 1000. For 
any of these three cases, can the hypothesis that the coin is good be rejected 
at the 95% level? 


Answer 

N 

X 2 


10 

3.20 


100 

2.00 


1000 

1.35 


Since xf r i t = 3.84, the coin appears to be good. 

12. After how many tosses of a single coin would a ratio of N h/N = 0.531 
cause one to reject, at the 95% level, the hypothesis that the coin is good? 
Answer: 500 

13. By making use of the additive property of X 2 , group the first five, the second 
five, and the third five values of X? from Table 12-5. 

(a) What is the value of / for each of the three groups? 

(b) Do these values of X 2 still support the hypothesis that the proposed 
distribution describes the data satisfactorily? 

(c) What is the average of these three values of X 2 ? 

(d) What is the expectation value of this X 2 and its standard deviation? 

Answer: (a) 5 (b) yes (c) 4.73 (d) 5 ± VlO 

14. During the course of the derivation of the X 2 -distribution it was found that 
x is normally distributed. Considering the meaning of the quantity t of 
Section 12-6, what would you expect its distribution to approach for 
large values of /? Verify your expectation. Hint: Use T(n) = (n — 1)! 
and Eq. (8-16). Note also that, if n is a large number, 

(1 + a/ri) n -> 1 + n(a/ri) -f- ^ ( a/n ) 2 H e°. 


Answer: normal distribution 



202 INTRODUCTION TO STATISTICAL ANALYSIS 

15. We found from Eq. (12-31) that x is normally distributed, but with the 
unknown parameters n and <r/\/ n. Let our estimate of a be called s, i.e., 

s 2 = S 2 //> 

and then show that the quantity 

< = 

(s/V n) 

has Student’s t distribution. [Hint: Change variables from the old x and q 
to the new s and t and work only with the necessary variable quantities 
from Eq. (12-31).] 

16. In Table B are given the probabilities of |l| exceeding or equaling the 
table entry. What is the probability that, at / = 10, t < 2.228. 

Answer: 0.975 

17. The X 2 -test is a general test for fit, not only as regards the form of an 
assumed distribution but also its parameters. On this basis the data of 
Table 12-4 were rejected as coming from a normal distribution defined by 
H = — 0.7229 and <r = 3.352. Could this data have come from a normal 
distribution with n — — 0.7229? 

Answer : Yes, since t (see Problem 15 above) ~ — 0.8 and 

( ?(t < 0.8,/ = 9) ~ 0.22. 

18. Several measurements, all made with the same procedure, of the percent 
zinc in each of three samples of a bronze gave: 


I 

II 

III 

9.82 

9.30 

9.10 

9.70 

7.96 

8.74 

8.90 

9.64 

8.93 

10.10 

8.79 

9.36 

10.67 

9.80 

10.68 

10.32 

8.47 

9.55 

9.25 


8.43 

8.98 


8.25 

9.53 


9.86 



7.80 

By working at the 95% level, examine the hypothesis that the samples are 
identical. If they appear to be so, combine the data for the two which 
appear more certain to be identical and reexamine the hypothesis. 

Answer: I and II, 

t = 2.065, 

icrit = 2.160; 

II and III, 

t = 0.187, 

^crit = 2.145; 

I and III, 

t = 1.853, 

t c rit = 2.110. 
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The hypothesis is accepted. However, 

I and (II + III), t = 2.194, < C rit = 2.069. 

The hypothesis is rejected: Sample I appears to be different if II and III 
are accepted as being identical. 

19. By evaluating and integrating <p(g,fx,fy) dg verify that, for f x = 4 and 
f y = 2, there is a 90% chance that 

0.1441 < < 19.25 

«»//» 

and that, for f x = 2 and f y = 4, there is the same chance that 

2 /. 

0.0519 < < 6.94. 

iy/fy 

20. Do the data given in Problem 18 above bear out the statement that all the 
measurements were made by the same procedure? 

Answer: Yes, typically, F(l, II) = 1.41, F cr it(5%) = 3.69. 

21. Consider the distribution derived from the 498 observations in the dart- 
dropping experiment as being the true parent distribution from which the 
data of Tables 12-2, 12-3, and 12-4 supposedly were taken. Examine the 
ratios of the estimates of the variance to the true variance to see if this 
supposition is borne out. [Hint: Note the columns in Table D for infinite 
degrees of freedom.] 

Answer: For Table 12-2, F = 0.875, F cr it = 0.369; accept the supposition. 
For Table 12-3, F = 2.18, F CT it = 1-88 at 5% = 2.41 at 1%; accept the 
supposition at the former level; reject at the latter. For Table 12-4, 
F = 0.0436, Fcrit = 0.232; reject the supposition even at 1% level. 

22. Five samples of a bronze, analyzed for percent zinc by each of two different 
procedures gave the following results: 


Sample no. 

Method I 

Method II 

1 

9.14 

9.18 

2 

9.01 

9.46 

3 

9.58 

9.60 

4 

9.38 

9.67 

5 

10.06 

9.92 


Examine the data for the probabilities that the methods are equivalent 
and that the samples have the same percent zinc. 

Answer: 



q 2 /f 

F 

F 95 

F 99 

Method differences 

0.0436 

1.58 

7.71 

21.20 

Sample differences 
Error 

0.2173 

0.0276 

7.87 

6.39 

15.98 
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The methods appear to be equivalent; the assumption that the samples 
have the same percent zinc is rejected at the 95% level but accepted at the 
99% level. 


23. 


Show how to construct a correlation coefficient table from Table C by 
evaluating r to 3 decimal places for 3, 5, and 10 observations at the 95% 
and 99% rejection levels. 

Answer: 


/ 

95 

99 

1 

0.997 

1.000 

3 

0.878 

0.959 

8 

0.632 

0.765 


24. The tensile strength in 100 lb/in 2 and the hardness on the Rockwell E 
scale were measured for each of ten samples of die-cast aluminum, with 
the following results: 


Tensile strength 

377 

247 

348 

298 

287 

292 

345 

380 

257 

258 

Hardness 

70 

56 

86 

60 

72 

51 

88 

95 

51 

75 


Find r from Eq. (12-77), t from Eq. (12-89), and compare with Eq. (12-92). 
How likely is it that the hypothesis of no correlation is correct? 

Answer: r = 0.708, t = 2.83. There is a chance of only slightly more 
than two times in 100 that the observed correlation would occur when, in 
fact, the variables were uncorrelated. 
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APPENDIX 1 


NORMALIZATION OF THE NORMAL 
DISTRIBUTION 


The integration of Eq. (8-10) is facilitated by a change of variable. Let 
t = hx. Then 

1 = [* (k/h)e~ t2 dt = (2k /h) [°° e~ t2 dt. 

J — 00 Jo 

In order to evaluate this integral, let us consider 

r R .2 

I = [ e ~ l dt, 

Jo 

where R is a large number which will later be allowed to approach infinity. 
Since this is a definite integral, it will have the same numerical value if 
any other variable is substituted for t. Therefore one can also write 



To verify the last step, we need only perform the integration indicated 
by the double integral on the right-hand side, and since the variables are 
separable, the expression will break down into the product of the two 
integrals. The double integral can be represented geometrically as the 
volume under a surface of revolution obtained by rotating a normal dis- 
tribution function about the 2 -axis as shown in Fig. Al-1. The volume 
represented by this integral is in fact that portion of space in the first 
octant between the surface of revolution, the a:y-plane, the yz - plane and 
the 2 £-plane, and is cut off at x = R and y — R, as shown in the figure. 
If R is large, we can determine the same volume approximately by means 
of the following integral: 
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Z 



which gives that part of the volume bounded by the surface of revolution, 
the xy-, yz-, and zz-planes, and the circle r = R. This integral is always 
less than I 2 . If the integration over r is carried out between the limits 0 
and Ry/2, we can see from the figure that the volume included will always 
be greater than I 2 . That is, if 

N = / / re drdd, 

Jo Jo 

then M < I 2 < N. But 

M = 7^ f Q e ~ r * r dr = \ [1 — e ~ R \ 

which approaches 7r/4 when R — > oo. Since we can similarly show that N 
approaches 7t/4 when R oo, we find that I 2 = 7t/4 and 



APPENDIX 2 


EVALUATION OF THE STANDARD DEVIATION 
FOR THE NORMAL DISTRIBUTION 


It was shown in Appendix 1 that 



If one sets t = hx so that dt — h dx, then 



While h is constant for the integration, the result must be true for any 
value of h. Thus each side of the equation can be differentiated with respect 
to h to yield 

/ oi 2 — h?x 2 7 ‘\/ TT 

I — 2 hx e dx = — 0 . 0 - • 

Jo 

Hence 



dx 



This device, by which the value of a difficult definite integral can be 
derived from a known one, will be used extensively in Appendix 6. 
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SUMS OF POWERS OF INTEGERS; 
SOURCE OF MATHEMATICAL TOOLS 


A. The sums of powers of the first n integers can be found by the following 
procedure : 

Y, [(Ik + l) 2 - k 2 \ = [l 2 - 0 2 ] + [2 2 - l 2 ] + ■ • • 

fc=0 

+ [n 2 — {n — l) 2 ] + [O + l) 2 — n 2 ] 

= (n + l) 2 - 

Therefore, by expanding a term on the left, we obtain 
(2fc + 1) = 2 ^3 k -f- (n + 1) 

k= 0 fc = 0 

= 2 Y k + (n + 1) = (n + l) 2 , 

fc=l 

from which we get 

23 k + !)• 

k=i 

Similarly, 

23 [(& + l) 3 — 7c 3 ] = 3 23 “f~ 3 2] ^ + (w + 1) = (ft + l) 3 . 

fc= o fc=l fc=l 

Knowing ]L)t=i &, we find that 


23 7c 2 = 6 n ( n + l)(2n + 1). 

k = l 

By successively increasing the powers of (fc + 1) and of k on the left- 
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hand side of the starting equation, we can find 

^2 k 3 = \n 2 (n + l) 2 , 

fc=i 

T. fc 4 = 3 *$n(n + 1)(2 n + l)(3n 2 3 n — 1), etc. 

k=l 

B. The sum of the squares of the odd integers up to 2r — 1 can be found 
by subtracting the sum of the squares of the even integers up to 2r — 2 
from the sum of the squares of the first 2r — 1 integers. Note that 
the numbers 2, 4, 6, 8, . . . are twice 1, 2, 3, 4, . . . , so that 

(2 2 + 4 2 + 6 2 H ) = 4(1 2 + 2 2 + 3 2 H ). 

C. The reader is referred to Handbook of Chemistry and Physics (Cleveland: 
Chemical Rubber), especially the newer editions, as an excellent source 
of numerical tables, integrals, differentials, general mathematical and 
statistical formulas, and approximations to various mathematical 
functions. 



APPENDIX 4 


WEIGHTS FOR QUANTITIES THAT HAVE BEEN 
EVALUATED BY THE METHOD OF LEAST SQUARES 


From the definition of r* following Eq. (11-16) we see that 

[ar] = P x [waa\ + /3 2 [wab] + • • • + p q [i vaq], 

[br] = pi [wab] + p 2 [wbb ] + • • • + Pq[wbq], (A4-1) 

# 

[?r] = ft [ivaq] + p 2 [wbq] 4 + p q [wqq]. 

Although the numerators of the p’s were defined as the minors, with 
appropriate algebraic sign, of the terms in the second column of the 
determinant which appears explicitly in Eq. (11-12), they are also seen 
to be the minors, with appropriate sign, of the terms in the second column 
of the determinant for A, shown in Eq. (11-13). The denominators of 
[ar], [br], etc. are A; if the numerators are written as determinants, it is 
seen that all of them except that for [br] have two identical columns and 
hence are zero [12]. The numerator of [br], however, is seen to be A so 
that the left-hand sides of all the equations (A4-1) are zero except the 
second, which is one. 

It can now be shown that p 2 = [tt/w\. The summation [tt/iv] must be 
calculated from the values of the r’s which were given after Eq. (11-16). 
First, the expressions for T\, t 2 , . . . are squared: 

rf = (0iWi«i) 2 + ( p 2 W\bi ) 2 + (fowici) 2 + * • • + PqWiqi) 2 
+ 2p 1 p 2 wfa 1 b 1 + 2p 1 p 3 w 2 a 1 c 1 + • • • » 

T~2 = ( P\W 2 a 2 ) 2 + (02 w; 2&2) 2 T" (P3W2C2) 2 + ’ ‘ ' + (PqW 2<?2) 2 

-f- 2Pip 2 w 2 u 2 b 2 -f- 2piP 3 w 2 a 2 c 2 -f- * * * > etc. 

(A4-2) 

Next, the first of these equations is divided by ivi, the second by w 2 , and 
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so on, and the results are added to give 


= fS\[waa] + 0 l[wbb] + Pl[wcc] + • • • + 0 %[wqq] 
+ 20i0 2 [waf>] + 20 x 0 3 [wac] -| 


Note that the first term is the sum of all the first terms of Eqs. (A4-2), 
the second term is the sum of all the second terms, etc. This equation can 
be rearranged in the following form 



0i {0i [waa] -f fi 2 [wab] H + P q [waq}} 

+ 02 {0i [wab] + P 2 [wbb] + • • * + P q [wbq]} 

+ 03 {01 [wac] + 0 2 Mc] H + p q [wcq]} H 

+ 0 3 {0i [wag] + p 2 [wbq] H + P q [wqq]} . 


But our examination of Eqs. (A4-1) has shown that the first line of the 
expression on the right-hand side is zero, the second line is 0 2 X 1, and the 
third and all subsequent lines are equal to zero. The entire expression on 
the right-hand side then reduces to 0 2 , so that [tt/iv] = (3 2 . 
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A DEFINITE INTEGRAL RELATED TO 

THE SPREAD BETWEEN PAIRS OF OBSERVATIONS 


Let 

I{w) = r e~ iu2+uw) du. 

J — 00 


Since the first power of u appears in the integral, we cannot use the device 
of letting this integral be twice the integral from 0 to oo . We must begin by 
integrating by parts with 


Then 


U = e~ u2 , dV = e~ uw du. 


I(w) = 


1 ^ — (U 2 +UUI) 

W 


00 


— 00 


2 

w 



ue -^+uw) 


du, 


where the integrated part is zero. We next use the device of Appendix 2 
and note that 


dJ 

dw 



ue~ iu2+uw) 


du. 


Thus 


the solution of which is 


m = 


2 dl_ 
w dw 


I(w) = Ce w2l \ 


where In C is the constant of integration. To evaluate C, we note that 

1(0) = C = f“ e-“* du , 

J 00 

which, according to Appendix 1, is \/tt. Thus 

I(w) = y/i r e w2/ 4 . 
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CERTAIN DEFINITE INTEGRALS 


The methods used in Appendix 2 can be extended to the general case. 
Consider 



If both sides of this equation are successively differentiated with respect 
to a, one obtains 



It seems clear that if this process were carried out some indefinite number 
of times, one would find that 



This result is useful provided that j is an integer, but we are just as 
likely to be faced with cases where j is half-integral. To examine the 
latter, let k be an integer, j — k — and u = v 2 . Then 



2 



v 


2 kg—av 2 


dv. 


Setting h 2 = a, we obtain from the second equation of Appendix 2, 
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By again using the device of successive differentiation, we get 


2 



2k —av 2 

v e 


dv 


Vjr • 1 • 3 • 5 • - • (2k - 1) 
2 * a ( 2 fr + l )/2 


Thus when j is half-integral, 



u j e - au du= 




a 


i+i 


which, we see, bears a certain resemblance to the result for integral j. 
The leading factors are j(j — l)(j — 2) • • • (^), which is like j\ except 
that all the factors are half-integral. 

It would be convenient to have one notation to describe both these 
cases; such a notation in fact exists, and it is called the gamma function. 
The complete discussion of this function certainly is not a proper topic for 
this book, but those properties of it which are needed for the present 
application can be described easily. The gamma function of a number n 
is written r(n). For any positive n, 


Furthermore, 


r (to + 1) = nT(n). 

r(i) = l, r® = V?. 


Thus if n is an integer, 

r(n + 1) = nT(ri) = n(n - l)r(» - 1) 

= n(n — l)(n — 2)T(n — 2), etc., 

so that 

r(n + 1) = nl 

If n is half-integral, then 

T(n + 1) = n(n — l)T(n — 1) = n(n — 1 )(n — 2)T(n — 2) 

= n(n — l)(n — 2) ■ • • (|)r(^) = n(n — 1) • • • 

Thus for j either integral or half-integral 

f u’e~ au du = 

Jo a ,+l 

and the various normalizing factors for the distributions of Chapter 12 
which are related to the X 2 distribution can be described easily in terms 
of the gamma function whether the number of degrees of freedom is even 
or odd. 
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MULTIPLE INTEGRATION: JACOBIANS 


Suppose that one wished to use the rate of working to determine the total 
work done by gravity on a falling body when the body falls from a height h 
to ground level h — 0. The rate of working, which is the product of force 
and velocity, is (—mg)v so that the desired result would be 

rt o 

W = — / mgv dt, 

Jo 

where t 0 is the time required to fall the distance h. As an introduction to 
the subject of this appendix, we choose to work this problem in the 
following way. Instead of using t as an integration variable, we will use 
the distance above ground. This is given by 




S = h — %gt 2 , 


from which 

we obtain 





i = j| (A - s } ] 

(A7-1) 

and 





II 

■8 IS 

II 

1 

OK 

II 

to 

1 

Co 

to 

(A7-2) 


It is the use of Eqs. (A7-1) and (A7-2) which constitutes the main point 
of this example. Having them enables us to write 


W — — mg 


f 


v ^fs ds ’ 


i.e., we replace v by an expression involving a new variable of integration, 
multiply it by the derivative of the old variable with respect to the new 
variable, and append the differential of the latter. The limits of integration 
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are, of course, changed appropriately. To complete this example: 

dt _ 1 

ds [2g(k — s)] 1/2 ’ 

f h [2^(A — s)] 1/2 j 

W = m Sj 0 [2gjh~ 7)j.7i ds = m « h - 

Now suppose that the function to be integrated is one of two variables. 
The problem has suddenly become much more complex, so much more so, 
in fact, that proofs and the general discussions for any number of variables 
are far beyond the scope of this book. On the other hand, procedures for 
developing and using the results of such general discussion can be described 
in an illustrative and suggestive way. 

Consider 

F = y) dxdy, 

and suppose that for some reason, we wish to carry out the integration 
over plane polar coordinates rather than over the plane cartesian co- 
ordinates. The two equations which correspond to the single Eq. (A7-1) 
of the previous illustration are 

x = r cos 6, y = r sin 6. 



It is sufficient for our present purposes to say that partial differentiation 
is the differentiation of a function with respect to one designated variable, 
while the other variables are held constant. Thus in our case, dx/dr = 
cos 6, and dx/dd = — r sin 6. If the reader likes, he may consider these 
as slopes. They tell us how fast x changes per unit change in r at a constant 
value of 6, and vice versa. Hence for small changes dr and dd we obtain the 
following changes dx in x and dy in y, to the first order of small quantities: 


d * = ¥r dr + ¥e de ’ 


d y = t dr + % de - 
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Solution of these by the method of determinants yields 


dr = J 


— 1 


, dx 

dx i* 

, dy 

dy Te 


dO = J 


—i 


dx , 
t- ax 
dr 


dy 

dr 


dy 


where the Jacobian J(x, y; r, 0), also designated by 

d(x, y) 


J(r, 8) or 


d(r, 8) 


IS 


J = 


dx dx 
dr dO 

dy dy 
dr dd 


For the coordinate change of our example, 


J = r, 

dr = cos 6 dx + sin 6 dy, (A7-3) 

dd = r~ l (cos 0 dy — sin 0 dx) ; 

the last equation of (A7-3) is better written as 

r dO — cos 0 dy — sin 0 dx. (A7-4) 

Let us refer now to Fig. A7-1, where the short dashed lines are there 
only as visual aids. Points P\ and P 2 are two arbitrary points separated 
by some dx in the x-direction and dy in the ^-direction. These two direc- 
tions are perpendicular to each other, and the product dxdy is a proper 
differential area. It is seen that the dr and r do shown in the figure are, to 
the first order of small quantities, just those given by Eqs. (A7-3) and 
(A7-4). To the same order they are mutually perpendicular, and their 
product is a proper differential area. While the two areas dxdy and r dOdr 
are not equal — and there is no reason why they should be so long as the 
mutual perpendicularity between the distance elements of a given pair is 
maintained — the distance between Pi and P 2 is maintained, as it should 
be. That is, if the reader likes algebra, he can show that 

[(dr) 2 + (rdO) 2 ] 112 = [(dx) 2 + (dy) 2 ] 112 . 

Thus it turned out that dxdy is to be replaced by J dOdr, and, of course, 
the x and y in the original integrand are to be replaced by r cos 0 and 
r sin 0, respectively. 
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Let us mention one more similar example. This very important one is 
the change from three-dimensional cartesian coordinates to spherical 
coordinates, illustrated in Fig. A7-2. Here 

z 

x = r sin 8 cos <f>, 
y — r sin 8 sin </>, 


J 


z = 

= r cos 8, 

dx 

dx 

dx 

dr 

d8 

d<f> 

dy 

dy 

dy 

dr 

dd 

d<f> 

dz 

dz 

dz 

dr 

dd 

d<f> 


= r 2 sin 8. 



Thus 

dxdydz — > r 2 sin 8 drddd<f> = ( dr)(r dd)(r sin 8 d<f>). 

As with the previous example, it could be found that 

[(dr) 2 + (rdd) 2 + (r sin 8 d<f>) 2 ] 112 = [(dx) 2 + (dy) 2 + (dz) 2 ] 112 . 
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CERTAIN DEFINITE INTEGRALS 
OF TRIGONOMETRIC FUNCTIONS 


For the sake of convenience, we will write 

fx = m, f v = n, f = m + n 
in order to discuss the evaluation of 



(cos d) m 1 (sin d) n 1 dd. 


(A8-1) 


In any good integral table, such as the one referred to in Appendix 3, we 
can find that 


J (cos d) m 1 (sin 8) n 1 dd 


(cos 6) m 2 (sin fl) n 
m + n — 2 

+ m r ~ 2 — I (cos d) m ~ 3 (sin 0) n_1 dd. 
m + n — 2 J 


For limiting values of 0 and 7t/2 the integrated part is zero. This expression 
can be applied successively to (A8-1), but the results will be different 
depending on whether m and n are even or odd integers. 

If m is even, the result is 


(m - 2)(m — 4) • • • (4) (2) 

(/ — 2 )(/ — 4) • • • (» + 4)(n + 2) 



cos d (sin d) n 1 


dd. 


The numerator and denominator of the coefficient of the integral each has 
(m — 2)/2 factors. The integral is easily evaluated; it is 



(sin 0) n— 1 d(sin d) = 


1 

n 


Thus the result can be written as 


1 £( m — 2)j(m — 4) • • • (2)(1) 

2 — 2)i(f — 4) • • • ^(n + 4)^(n + 2)^n 
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From the discussion of the gamma function in Appendix 6, we see that we 
can write the numerator as T(m/2). Furthermore, regardless of whether 
n is even or odd, we can multiply the numerator and denominator by 
T(n/2) , the first factor of which is ( n — 2)/2. Each factor in the denomina- 
tor after the first is one less than the one preceding it; to multiply by 
T(»/2) continues this progression past n/2. If n is even, then so is /; 
if n is odd, then / is also odd. In either case, the final result is 

/;' 2 (cos «— (sin »)— de = \ ■ 

We shall not repeat the argument for the case of odd m. If the reader 
cares to, he can verify that the result given above is good for any com- 
bination of odd or even m and n. 
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SOLUTION OF SIMULTANEOUS EQUATIONS 


The following method for solving simultaneous equations is an extension 
of the process of successive elimination of the unknowns, which is used 
in the algebraic solution of small numbers of equations. Thus it can be 
applied to any set of equations that has a solution. 

Here we will designate the unknowns by X{ and write the equations 
as: 

Gll^l + &\2 X 2 + * • - + Uln^n = Z\, 

Ct21 x l “h &22 x 2 H + n x n = Z 2 , 

®nl^l <x n2 x 2 + * * * 4* OnA = Z n . 

Numerical accuracy is best maintained if the unknowns are eliminated 
in the order of increasing absolute magnitude. It is supposed that the 
equations have been written so that, to the best of one’s knowledge, 

|xi| < \x 2 \ < • • • < |*n|; 

and the method will be described so that aq is eliminated first, x 2 is 
eliminated second, and so on. If one has a set of equations for which 
it is judged that reordering is necessary to satisfy this requirement, 
the reordering need only be done within each equation for general use 
of the method. Whenever the equations possess the symmetry ex- 
hibited by least-squares normal equations when they are set down in the 
order in which one would naturally arrange them, so that o t -y = ayy, and 
reordering seems desirable, then the equations should also be reordered 
so that this symmetry is kept. It will be seen that the existence of this 
symmetry reduces the labor of calculation. 

We next set up the pattern shown in Table A9-1, noting as we do so 
that a,ij has been put into the (j, i) position. The equation in the upper 
right-hand corner of this pattern means that aq is to be eliminated from 
the equations; we also note that this equation will be used for evaluating 
X\ after all the other Xi have been found. 
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Table A9— 1 


Zl 

Z2 

23 - * 

• Z w 

1 

Xl = ~ 
an 

(zi — 

dl 2 X 2 ’ * ' 

dlnXn) 


/ 

an 

O 21 

031 • • 

dfll 

^21 

an 

<*31 

an 

O 4 I 

Oil 

> 

Onl 

Oil 


A 4 

0 12 

a 2 2 

CO 

• d n2 

1 

0 

0 

0 

1 

B 


®13 

«23 

CO 

CO 

53 

dn 3 

0 

1 

0 • • . 

0 




®2n 

© 

CO 

9 

dnn 

0 

1 

0 

0 •• • 

1 , 



After X\ is eliminated we will have a new set of (n — 1) equations in 
(n — 1) unknowns. Thus we need to show the elimination process only 
once; the same process can be applied for any value of n. We will designate 
the new values of z with primes and use bij for the new coefficients of the 
remaining x’s. The values of z\ are found by summing products of the 
terms across the row of Zi and down a column in part B ; we use subscripts 
in such a way as to show that it is x i that has been eliminated. Thus 

Z f 2 = Zl + 22 ( 1 ) + ^ 3 ( 0 ) + ' * ' = Z\ ^ — + Z %> 

4 = 21 (“ Sit) + Z3 ’ 

4 = 21 (“ 5t|) + 241 

= 21 (“ O^) + 

The values of bij are found in a similar way except that the entries in rows 
of A, starting with the second roiv, are used instead of the values of z;. 
For example, , x 


&22 — d 12 1 

• 

• 

f 0 2 l 

\ «1 J 

| + 022> 

bn2 — 012 1 

• 

( 

V oij 

( “f" O n 2, 

♦ 

&33 = O13 | 

(- :::) 

) + O33, 

bnn : = Oi n 1 

( a n i> 

\ 

| H - a nn . 
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Then the new pattern is: 


z 2 

23 * 

• • 2 ^ 

6n 

X2 = (2 2 

&22 

— &23 a: 3 — 

& 2 n^n) 

7 

7 

7 

& 32 

&42 

bn2 

o 22 

o 32 • 

‘ • &n 2 

&22 

622 

b 22 

&23 

&33 ’ 

• • b n2 

1 

0 • • • 

0 

b 2m 

bzn • 

bnn 

0 

0 • • • 

1 


In order to make the ending of this process clear we now set down the 
complete solution for three equations in three unknowns: 


2 l 

22 

23 

3l == ( 2l « 12^2 « 13 ^ 3 ) 

«11 

an 

021 

«31 

^21 03 1 

011 an 

«12 

022 

«32 

1 0 

013 

023 

033 

0 1 


*2 

23 

*^2 = 7~ (22 — & 23 ^ 3 ) 

&22 


&22 

& 32 

<N 

CO <N 
rO rO 

! 


&23 

& 33 

1 

23 

*3 = -A- (40 

c 33 

C33 




After having determined x 3 we now go back up the equations included 
in the pattern to determine successively x 2 and X\. 
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We can now see the reason for ordering the normal equations in the 
manner described above and in the text. If the unknowns are reordered 
in each equation and the equations are not also reordered, the symmetry 
will be lost. It can be seen that if an = ay t -, then = bji, and so on; the 
labor of calculation is reduced accordingly. 
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Table B 

x 2 -values 



Degrees 
of free- 
dom 

P = 0.99 

0.98 

0.95 



0.70 

1 

0.000157 

0.000628 

0.00393 

0.0158 

0.0642 

0.148 

2 

0.0201 

0.0404 

0.103 

0.211 

0.446 

0.713 

3 

0.115 

0.185 

0.352 

0.584 

1.005 

1.424 

4 

0.297 

0.429 

0.711 

1.064 

1.649 

2.195 

5 

0.554 

0.752 

1.145 

1.610 

2.343 

3.000 

6 

0.872 

1.134 

1.635 

2.204 

3.070 

3.828 

7 

1.239 

1.564 

2.167 

2.833 

3.822 

4.671 

8 

1.646 

2.032 

2.733 

3.490 

4.594 

5.527 

9 

2.088 

2.532 

3.325 

4.168 

5.380 

6.393 

10 

2.558 

3.059 

3.940 

4.865 

6.179 

7.267 

11 

3.053 

3.609 

4.575 

5.578 

6.989 

8.148 

12 

3.571 

4.178 

5.226 

6.304 

7.807 

9.034 

13 

4.107 

4.765 

5.892 

7.042 

8.634 

9.926 

14 

4.660 

5.368 

6.571 

7.790 

9.467 

10.821 

15 

5.229 

5.985 

7.261 

8.547 

10.307 

11.721 

16 

5.812 

6.614 

7.962 

9.312 

11.152 

12.624 

17 

6.408 

7.255 

8.672 

10.085 

12.002 

13.531 

18 

7.015 

7.906 

9.390 

10.865 

12.857 

14.440 

19 

7.633 

8.567 

10.117 

11.651 

13.716 

15.352 

20 

8.260 

9.237 

10.851 

12.443 

14.578 

16.266 

21 

8.897 

9.915 

11.591 

13.240 

15.445 

17.182 

22 

9.542 

10.600 

12.338 

14.041 

16.314 

18.101 

23 

10.196 

11.293 

13.091 

14.848 

17.187 

19.021 

24 

10.856 

11.992 

13.848 

15.659 

18.062 

19.943 

25 

11.524 

12.697 

14.611 

16.473 

18.940 

20.867 

26 

12.198 

13.409 

15.379 

17.292 

19.820 

21.792 

27 

12.879 

14.125 

16.151 

18.114 

20.703 

22.719 

28 

13.565 

14.847 

16.928 

18.939 

21.588 

23.647 

29 

14.256 

15.574 

17.708 

19.768 

22.475 

24.577 

30 

14.953 

16.306 

18.493 

20.599 

23.364 

25.508 


For degrees of freedom greater than 30, the expression V2x 2 — V2n' — 1 may 
be used as a normal deviate with unit variance, where n' is the number of degrees of 
freedom. (This Table is taken from Table III of R. A. Fisher, Statistical Methods for 
Research Workers, 12th ed., 1954, published by Oliver & Boyd Ltd., Edinburgh, by 
permission of the author and publishers.) 
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Degrees 
of free - 
dom 

0.50 

0.30 

0.20 

0.10 

0.05 

0.02 

0.01 

1 

0.455 

1.074 

1.642 

2.706 

3.841 

5.412 

6.635 

2 

1.386 

2.408 

3.219 

4.605 

5.991 

7.824 

9.210 

3 

2.366 

3.665 

4.642 

6.251 

7.815 

9.837 

11.341 

4 

3.357 

4.878 

5.989 

7.779 

9.488 

11.668 

13.277 

5 

4.351 

6.064 

7.289 

9.236 

11.070 

13.388 

15.086 

6 

5.348 

7.231 

8.558 

10.645 

12.592 

15.033 

16.812 

7 

6.346 

8.383 

9.803 

12.017 

14.067 

16.622 

18.475 

8 

7.344 

9.524 

11.030 

13.362 

15.507 

18.168 

20.090 

9 

8.343 

10.656 

12.242 

14.684 

16.919 

19.679 

21.666 

10 

9.342 

11.781 

13.442 

15.987 

18.307 

21.161 

23.209 

11 

10.341 

12.899 

14.631 

17.275 

19.675 

22.618 

24.725 

12 

11.340 

14.011 

15.812 

18.549 

21.026 

24.054 

26.217 

13 

12.340 

15.119 

16.985 

19.812 

22.362 

25.472 

27.688 

14 

13.339 

16.222 

18.151 

21.064 

23.685 

26.873 

29.141 

15 

14.339 

17.322 

19.311 

22.307 

24.996 

28.259 

30.578 

16 

15.338 

18.418 

20.465 

23.542 

26.296 

29.633 

32.000 

17 

16.338 

19.511 

21.615 

24.769 

27.587 

30.995 

33.409 

18 

17.338 

20.601 

22.760 

25.989 

28.869 

32.346 

34.805 

19 

18.338 

21.689 

23.900 

27.204 

30.144 

33.687 

36.191 

20 

19.337 

22.775 

25.038 

28.412 

31.410 

35.020 

37.566 

21 

20.337 

23.858 

26.171 

29.615 

32.671 

36.343 

38.932 

22 

21.337 

24.939 

27.301 

30.813 

33.924 

37.659 

40.289 

23 

22.337 

26.018 

28.429 

32.007 

35.172 

38.968 

41.638 

24 

23.337 

27.096 

29.553 

33.196 

36.415 

40.270 

42.980 

25 

24.337 

28.172 

30.675 

34.382 

37.652 

41.566 

44.314 

26 

25.336 

29.246 

31.795 

35.563 

38.885 

42.856 

45.642 

27 

26.336 

30.319 

32.912 

36.741 

40.113 

44.140 

46.963 

28 

27.336 

31.391 

34.027 

37.916 

41.337 

45.419 

48.278 

29 

28.336 

32.461 

35.139 

39.087 

42.557 

46.693 

49.588 

30 

29.336 

33.530 

36.250 

40.256 

43.773 

47.962 

50.892 
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t TEST OF SIGNIFICANCE BETWEEN TWO SAMPLE MEANS (%i AND X 2 ) 



Degrees 

of 

freedom 

*P = 0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

1 


0.325 

0.510 

[Kjl||S| 

1.000 


2 


0.289 

0.445 

1 

0.816 


3 

0.137 

0.277 

0.424 


0.765 


4 

0.134 

0.271 

0.414 


0.741 


5 

0.132 

0.267 

0.408 


0.727 


6 

0.131 

0.265 

0.404 


0.718 


7 

0.130 

0.263 

0.402 

1 

0.711 


8 

0.130 

0.262 

0.399 


0.706 

0.889 

9 

0.129 

0.261 

0.398 


0.703 

0.883 

10 

0.129 

■■ 

0.397 



0.879 

11 


r?: -J 

0.396 


0.697 

0.876 

12 

0.128 

0.259 

0.395 


0.695 

0.873 

13 

0.128 

0.259 

0.394 



0.870 

14 

0.128 

0.258 

0.393 


0.692 

0.868 

15 

0.128 

0.258 

0.393 


0.691 

0.866 

16 

0.128 

0.258 

0.392 


0.690 

0.865 

17 

0.128 

0.257 

0.392 



0.863 

18 


0.257 

0.392 

1 

0.688 

0.862 

19 


0.257 

0.391 

I 

0.688 

0.861 

20 

0.127 

0.257 

0.391 

■ 

0.687 

0.860 

21 

0.127 

0.257 

0.391 


0.686 

0.859 

22 

0.127 

0.256 

0.390 

1 


0.858 

23 

0.127 

0.256 

0.390 



0.858 

24 

0.127 

0.256 

0.390 


0.685 

0.857 

25 

0.127 

0.256 

0.390 

I 

0.684 | 

0.856 

26 

0.127 

0.256 

0.390 


0.684 

0.856 

27 

0.127 

0.256 

0.389 

I 

0.684 

0.855 

28 

0.127 

0.256 

0.389 


0.683 

0.855 

29 

0.127 

0.256 

0.389 

■ 

0.683 

0.854 

30 

0.127 


0.389 

1 

0.683 

0.854 

00 

0.12566 

0.25335 

0.38532 

0.52440 

0.67449 

0.84162 


* P is the probability of having t this large or larger in size by chance. (This Table is 
taken from Table IV of R. A. Fisher, Statistical Methods for Research Workers, 12th ed., 
1954, published by Oliver & Boyd Ltd., Edinburgh, by permission of the author and 
publishers.) 
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Degrees 

of 

freedom 

0.3 

0.2 

0.1 

0.05 

0.02 

0.01 

1 

1.963 

3.078 

6.314 

12.706 

31.821 

63.657 

2 

1.386 

1.886 

2.920 

4.303 

6.965 

9.925 

3 

1.250 

1.638 

2.353 

3.182 

4.541 

5.841 

4 

1.190 

1.533 

2.132 

2.776 

3.747 

4.604 

5 

1.156 

1.476 

2.015 

2.571 

3.365 

5.032 

6 

1.134 

1.440 

1.943 

2.447 

3.143 

3.707 

7 

1.119 

1.415 

1.895 

2.365 

2.998 

3.499 

8 

1.108 

1.397 

1.860 

2.306 

2.896 

3.355 

9 

1.100 

1.383 

1.833 

2.262 

2.821 

3.250 

10 

1.093 

1.372 

1.812 

2.228 

2.764 

3.169 

11 

1.088 

1.363 

1.796 

2.201 

2.718 

3.106 

12 

1.083 

1.356 

1.782 

2.179 

2.681 

3.055 

13 

1.079 

1.350 

1.771 

2.160 

2.650 

3.012 

14 

1.076 

1.345 

1.761 

2.145 

2.624 

2.977 

15 

1.074 

1.341 

1.753 

2.131 

2.602 

2.947 

16 

1.071 

1.337 

1.746 

2.120 

2.583 

2.921 

17 

1.069 

1.333 

1.740 

2.110 

2.567 

2.898 

18 

1.067 

1.330 

1.734 

2.101 

2.552 

2.878 

19 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

20 

1.064 

1.325 

1.725 

2.086 

2.528 

2.845 

21 

1.063 

1.323 

1.721 

2.080 

2.518 

2.831 

22 

1.061 

1.321 

1.717 

2.074 

2.508 

2.819 

23 

1.060 

1.319 

1.714 

2.069 

2.500 

2.807 

24 

1.059 

1.318 

1.711 

2.064 

2.492 

2.797 

25 

1.058 

1.316 

1.708 

2.060 

2.485 

2.787 

26 

1.058 

1.315 

1.706 

2.056 

2.479 

2.779 

27 

1.057 

1.314 

1.703 

2.052 

2.473 

2.771 

28 

1.056 

1.313 

1.701 

2.048 

2.467 

2.763 

29 

1.055 

1.311 

1.699 

1.045 

2.462 

2.756 

30 

1.055 

1.310 

1.697 

2.042 

2.457 

2.750 

00 

1.03643 

1.28155 

1.64485 

1.95996 

2.32634 

2.57582 
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Table D 

F TEST FOR EQUALITY OF VARIANCES 



0 F 


Degrees of 
freedom for 
lesser mean 
square 


Degrees of freedom for greater mean square 
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Table D ( continued ) 


Degrees of 
freedom for 
lesser mean 
square 


Degrees of freedom for greater mean square 


8 9 10 11 
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Degrees of freedom for greater mean square 


12 

14 

16 

20 

24 

30 

40 

50 

75 

100 

200 

500 

00 

2.38 

2.33 

2.29 

2.23 

2.19 

2.15 

2.11 

2.08 

2.04 

2.02 

1.99 

1.97 

1.96 

3.45 

3.35 

3.27 

3.16 

3.08 

3.00 

2.92 

2.86 

2.79 

2.76 

2.70 

2.67 

2.65 

2.34 

2.29 

2.25 

2.19 

2.15 

2.11 

2.07 

2.04 

2.00 

1.98 

1.95 

1.93 

1.92 

3.37 

3.27 

3.19 

3.07 

3.00 

2.91 

2.83 

2.78 

2.71 

2.68 

2.62 

2.59 

2.57 

2.31 

2.26 

2.21 

2.15 

2.11 

2.07 

2.02 

2.00 

1.96 

1.94 

1.91 

1.90 

1.88 

3.30 

3.19 

3.12 

3.00 

2.92 

2.84 

2.76 

2.70 

2.63 

2.60 

2.54 

2.51 

2.49 

2.28 

2.23 

2.18 

2.12 

2.08 

2.04 

1.99 

1.96 

1.92 

1.90 

1.87 

1.85 

1.84 

3.23 

3.13 

3.05 

2.94 

2.86 

2.77 

2.69 

2.63 

2.56 

2.53 

2.47 

2.44 

2.42 

2.25 

2.20 

2.15 

2.09 

2.05 

2.00 

1.96 

1.93 

1.89 

1.87 

1.84 

1.82 

1.81 

3.17 

3.07 

2.99 

2.88 

2.80 

2.72 

2.63 

2.58 

2.51 

2.47 

2.42 

2.38 

2.36 

2.23 

2.18 

2.13 

2.07 

2.03 

1.98 

1.93 

1.91 

1.87 

1.84 

1.81 

1.80 

1.78 

3.12 

3.02 

2.94 

2.83 

2.75 

2.67 

2.58 

2.53 

2.46 

2.42 

2.37 

2.33 

2.31 

2.20 

2.14 

2.10 

2.04 

2.00 

1.96 

1.91 

1.88 

1.84 

1.82 

1.79 

1.77 

1.76 

3.07 

2.97 

2.89 

2.78 

2.70 

2.62 

2.53 

2.48 

2.41 

2.37 

2.32 

2.28 

2.26 

2.18 

2.13 

2.09 

2.02 

1.98 

1.94 

1.89 

1.86 

1.82 

1.80 

1.76 

1.74 

1.73 

3.03 

2.93 

2.85 

2.74 

2.66 

2.58 

2.49 

2.44 

2.36 

2.33 

2.27 

2.23 

2.21 

2.16 

2.11 

2.06 

2.00 

1.96 

1.92 

1.87 

1.84 

1.80 

1.77 

1.74 

1.72 

1.71 

2.99 

2.89 

2.81 

2.70 

2.62 

2.54 

2.45 

2.40 

2.32 

2.29 

2.23 

2.19 

2.17 

2.15 

2.10 

2.05 

1.99 

1.95 

1.90 

1.85 

1.82 

1.78 

1.76 

1.72 

1.70 

1.69 

2.96 

2.86 

2.77 

2.66 

2.58 

2.50 

2.41 

2.36 

2.28 

2.25 

2.19 

2.15 

2.13 

2.13 

2.08 

2.03 

1.97 

1.93 

1.88 

1.84 

1.80 

1.76 

1.74 

1.71 

1.68 

1.67 

2.93 

2.83 

2.74 

2.63 

2.55 

2.47 

2.38 

2.33 

2.25 

2.21 

2.16 
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Accuracy (precision), 4, 6f, 29, 31 
vs. precision, 1, 6, 31, 142 

Analysis of variance; see F- 
distribution 

Approximations (methods), 13, 15f, 
20, 69f, 76, lOlf; see also 
Binomial expansion, Taylor’s 
expansion 

Average (arithmetic), 8, 43, 45, 48, 
51, 56, 60, 65, 73f, 84, 94, 97f, 
107f, 111, 113, 127f, 156, 158, 
162f, 168, 172f, 177, 182f, 189, 
193f, 201f 

Average deviation, 77 

Bar graph; see Histogram 

Binomial distribution, 42f, 50, 56, 

64f, 72, 74f, 77f, 144f, 149f, 158 

Binomial expansion, 13, 17, 20, 41, 
46, 145 

Chauvenet’s criterion; see Rejection 
of observations 

Chi-square distribution and testing, 
157f, 172, 200f 

Combinations (unordered groups), 
39f, 49f, 149 

Computers, use of, 3f, lOf, 24, 90f 

Condition equations, 98f, 114f 

Consistency of results, 127f, 132f, 
138f, 141, 144, 197; see also 
Chi-square, ^-distribution, 
Range, Rejection, Student’s 
t distribution 


Correlation coefficient, 188f 
sample, 193, 204 
Curve fitting, 23f, 29f; see also 
Mathematical functions 
black thread method of, 24f, 29f 
Curve plotting, 21f, 48 
residual, 26, 29 
rules for, 22, 25f 

Degrees of freedom, 106, 115, 163f, 
166f, 171, 174, 178f, 181f, 
185f, 196, 199, 201f 
Desk calculators, use of, 4, 11, 24, 
89f, 94, 113 

Determinants, 14, 91, 100, 124, 126, 
161f, 219f; see also Variable, 
changes of in integration 
Distribution, 42, 51f; see also 
Frequency distribution, 
Multinomial distribution, 
Probability, Universe 
asymmetric, 45, 47, 52, 55f, 70, 
78, 111, 147, 178f; see also 
Binomial, Chi-square, 

F-, Poisson, and Range 
distributions 
most probable, 61f, 83 
noncontinuous ; see Binomial 
distribution, Poisson 
distribution 

symmetric, 45, 52, 55, 60f, 63, 

65f, 77f; see also Binomial, 
Normal, Rectangular, and 
Student’s t distributions 
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INDEX 


Distribution function, 59f, 152f, 155, 
164, 200 

one-dimensional, 143f, 189 
two-dimensional, 144, 188f 

Error bars, 135, 138, 141 
Errors, 31f; see also Distribution, 
Distribution function, 
Fractional errors, Mistakes, 
Propagation of errors, True 
value 

chance (random), 3, 6, 34, 40, 43, 
51f, 82, 85, 135, 143f, 180f 
maximum, 57, 118 
systematic, 3f, 6, 127, 130f, 134f, 
142, 177, 181 
instrumental, 3 If, 127 
personal, 3 If 
theoretical, 31f, 141 
Expectation value, 43, 61, 65, 68f, 

74, 105, 108 

for binomial distribution, 43f, 56, 

72, 145f, 170, 200 
for Chi-square distribution, 200 
for normal distribution, 66, 72, 80 
96, 105, 108, 127, 151, 158f, 
168f, 172f, 181, 188f 
for Poisson distribution, 45f, 56, 

146f 

for range distribution, 111, 150f 
Experiments, illustrative, Behr free 
fall, 84f, 91, 94f, 141 
dart dropping, 54f, 58, 60, 66, 80, 
110, 168f, 201f 
focusing, 52f, 55 
Kundt’s tube, 94, 115 
Newton’s Law of Cooling, 23, 29, 
102f 

Experiments, role of, 2 

Factorial (Gamma) function, 41, 

49f, 69, 149f, 164, 175, 178, 

199, 201, 215f, 222 
F-distribution and testing, 177f, 203 
Fractional error, 6, 9f, 15, 123 
Frequency distribution, 48 
Functions; see Mathematical functions 


Graphic analysis, 21f, 90, 101, 114; 
see also Curve fitting, Curve 
plotting 

Histogram, 53f, 58f, 66, 68, 72, 80, 
110 

Identical readings, 107f 
Independent observations, 123, 157f, 
172f, 185 

Laboratory work, 1, 4, 8, 32, 85 
Least squares, linear problem, 

88f, 118 

method of, 82f, 118f, 193f, 198, 
212f 

nonlinear problem, lOOf, 115f, 118 
Limits of reliability, 79, 108f, 136 
Line of regression, 189f, 193 
Linear correlation; see Correlation 
coefficient 

Linear observation equations; see 
Straight line, Observation 
equations 

Mathematical functions of common 
occurrence, 28, 115, 140f; 
see also Polynomials, Straight 
line 

Mean; see Average, Expectation 
value 

Measure of precision, 65f, 73f, 95f, 
104f, 111, 118, 123, 127, 129f; 
see also Average deviation, 
Chi-square distribution, F- 
distribution, Probable error, 
Standard deviation 
Mistakes, 3, 32, 53, 55, 108; see also 
Rejection of observations 
Most probable value (or event), 24, 
43, 45, 48, 51, 56f, 61, 65, 67, 

7 Of, 73, 83, 97, 99f, 102, 113f, 
118, 138f, 147 

Multinomial distribution, 148f, 152, 
158, 200 

Nonlinear observation equations, 27f, 
115f; see also Polynomials 
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Normal (Gaussian) distribution, 21, 
33, 52, 59f, 73, 75f, 80f, 83f, 

96, 108, 111, 146f, 150f, 156, 
158f, 162f, 168f, 171, 188f, 

199f, 207f, 209 

Normal equations, 87f, 91, 98f, 

124f, 226 

Normalization, 64, 154, 163f, 176, 

192, 196, 207f 

Null hypothesis, 180f, 193, 197f, 202 

Numbers, types of, 10, 12, 14f, 31; 
see also Rounding off, 
Significant figures 

Observation equations, 86f, 92f, 98f, 
107, 118, 124f, 142; see also 
Identical readings 

Parameters, of observation equations, 
27f, 67, 94, 143; see also 
Average, Observation 
equations 

of distributions, 78, 168, 190; see 
also Expectation value, 
Standard deviation 

Permutations (ordered groups), 

39f, 49f 

Poisson distribution, 45f, 50f, 56, 

64, 68f, 75, 77f, 146f 

Polynomials, 28, 88, 113f, 116, 137f, 
140f 

Precision; see Accuracy, Measure of 
precision 

Probability, 34f, 73 ; see also 

Distribution; Distribution 
function; Experiments, 
illustrative, dart dropping 
analytical (method), 34f, 42f, 47, 
49f, 65, 68, 74, 80f 
compound (joint), 50, 58, 61f, 7 If, 
83, 96, 149, 159f, 166f, 172f, 
177f, 190, 194 

elementary (a priori ), 34f, 43, 45, 
47, 61, 145, 148, 158f, 170 
experimental (method), 34f, 41f, 

49, 56, 58, 68 

Probable error, 21, 73, 76f, 130, 155 


Propagation of error, 118f; see also 
Weights 
in products, 122 
in sums, 122 

Radioactive counting, 46f, 51, 146f, 
200 

Random variable; see Variable 
Range distribution and testing, 11 If, 
116, 150f, 158, 163, 200, 214 
Rectangular distribution, 75, 107, 200 
Rejection of observations, 57, 108f, 
116f, 135f, 138 

Residuals, 21, 56, 61f, 67, 83, 86, 88, 
102, 104f, 108f, 111, 113, 128, 
130, 132f, 158f, 163, 168, 182f, 
194 

large, 53, 55, 81, 108f, 135f; see 
also Mistakes, Rejection of 
observations 

Rounding off, 7f, 12f; see also 

Numbers, Significant figures 

Significant figures, lOf, 15f, 90, 137; 

see also Rounding off 
Simultaneous equations, 14, 89f, 93, 
98, 223f ; see also Determinants 
Smooth curves, see Curve plotting 
Standard deviation, 74f, 96f, 104f, 
143f 

for binomial distribution, 74, 145, 
170, 172, 199 

for chi-square distribution, 200 
for normal distribution, 74, 97f, 
106f, 113f, 118f, 151, 159f, 

176, 188f, 201, 209 
for Poisson distribution, 74, 146f 
for range distribution, 154f 
for rectangular distribution, 74 
for parameters of observation 

equations, 106f, 112f, 116, 162 
Straight line, 24f, 27f, 91f, 95, 113f, 
116, 126, 132f, 136, 138, 140, 
188f 

Student’s t distribution, 172f, 196f, 
201 f, 204; see also Correlation 
coefficient, sample 
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. Taylor’s expansion, 18f 

for more than one variable, 19, 60, 
101, 103 

True value (or error), 7, 21, 56f, 61f, 
67, 82, 104f, 119, 151, 160f, 172f, 
188f, 193f; see also Universe 

Universe (true or parent distribution), 
57, 59f, 67, 82, 96, 105, 108, 
119f, 127f, 156, 159, 172, 176, 
180(, 188, 193-, 203 

Variable, changes of in integration, 
153, 159f, 166f, 174f, 192, 

195f, 215, 217f 


dependent, 22, 92, 143 
independent, 22, 92, 126, 143 
evenly spaced, 94f, 114f 
random, 144, 189 
Variance, 177f, 203 

Weights, of observations, 6, 91, 93, 
95f, 100, 103f, 114f, 124f, 
133, 137f, 183 

of parameters of observation 
equations, 104f, 122, 124f, 
129f, 162, 212f 
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